2025-12-04T09:15:55.9077126Z Current runner version: '2.330.0'
2025-12-04T09:15:55.9082906Z Runner name: 'i-0f694664a515f0ebd'
2025-12-04T09:15:55.9083674Z Runner group name: 'default'
2025-12-04T09:15:55.9084551Z Machine name: 'ip-10-0-18-14'
2025-12-04T09:15:55.9087305Z ##[group]GITHUB_TOKEN Permissions
2025-12-04T09:15:55.9089461Z Contents: read
2025-12-04T09:15:55.9089974Z Metadata: read
2025-12-04T09:15:55.9090464Z ##[endgroup]
2025-12-04T09:15:55.9092393Z Secret source: Actions
2025-12-04T09:15:55.9093046Z Prepare workflow directory
2025-12-04T09:15:55.9605638Z Prepare all required actions
2025-12-04T09:15:55.9643085Z Getting action download info
2025-12-04T09:15:56.2863729Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd)
2025-12-04T09:15:58.7014998Z Download action repository 'pytorch/pytorch@main' (SHA:7716da9fb23f27a65b41f9f016a2afadf281c18f)
2025-12-04T09:16:14.9557511Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065)
2025-12-04T09:16:15.3701375Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722)
2025-12-04T09:16:15.6139732Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076)
2025-12-04T09:16:15.7957413Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:16:16.0372084Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T09:16:16.4034002Z Getting action download info
2025-12-04T09:16:16.5222593Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5)
2025-12-04T09:16:16.8089829Z Getting action download info
2025-12-04T09:16:16.9367573Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e)
2025-12-04T09:16:17.1795883Z Getting action download info
2025-12-04T09:16:17.3150873Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482)
2025-12-04T09:16:17.5240137Z Getting action download info
2025-12-04T09:16:17.6586135Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32)
2025-12-04T09:16:17.6589854Z ##[group] Inputs
2025-12-04T09:16:17.6590261Z   build-environment: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T09:16:17.6600090Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:16:17.6610734Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:16:17.6625433Z   sync-tag: 
2025-12-04T09:16:17.6626581Z   timeout-minutes: 300
2025-12-04T09:16:17.6626830Z   use-gha: 
2025-12-04T09:16:17.6627034Z   dashboard-tag: 
2025-12-04T09:16:17.6627290Z   s3-bucket: gha-artifacts
2025-12-04T09:16:17.6627572Z   aws-role-to-assume: 
2025-12-04T09:16:17.6628319Z   disable-monitor: false
2025-12-04T09:16:17.6628615Z   monitor-log-interval: 5
2025-12-04T09:16:17.6628934Z   monitor-data-collect-interval: 1
2025-12-04T09:16:17.6629248Z ##[endgroup]
2025-12-04T09:16:17.6629961Z Complete job name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:16:17.7300420Z A job started hook has been configured by the self-hosted runner administrator
2025-12-04T09:16:17.7400795Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh'
2025-12-04T09:16:17.7412339Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:16:17.7412939Z ##[endgroup]
2025-12-04T09:16:19.2751326Z Runner Type: linux.g5.4xlarge.nvidia.gpu
2025-12-04T09:16:19.2751790Z Instance Type: g5.4xlarge
2025-12-04T09:16:19.2752038Z AMI Name: unknown
2025-12-04T09:16:19.2803825Z AMI ID: ami-08982f1c5bf93d976
2025-12-04T09:16:24.7735937Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main
2025-12-04T09:16:24.7736364Z with:
2025-12-04T09:16:24.7736852Z   github-secret: ***
2025-12-04T09:16:24.7737578Z   instructions: All testing is done inside the container, to start an interactive session run:
  docker exec -it $(docker container ps --format '{{.ID}}') bash

2025-12-04T09:16:24.7738375Z   activate-with-label: false
2025-12-04T09:16:24.7738646Z   label: with-ssh
2025-12-04T09:16:24.7738879Z   remove-existing-keys: true
2025-12-04T09:16:24.7739219Z   fail-silently: true
2025-12-04T09:16:24.7739452Z env:
2025-12-04T09:16:24.7739640Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:16:24.7739915Z ##[endgroup]
2025-12-04T09:16:24.9131011Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info.
2025-12-04T09:16:24.9132256Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys
2025-12-04T09:16:24.9315415Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main
2025-12-04T09:16:24.9315847Z with:
2025-12-04T09:16:24.9316054Z   no-sudo: true
2025-12-04T09:16:24.9316282Z   submodules: recursive
2025-12-04T09:16:24.9316536Z   fetch-depth: 0
2025-12-04T09:16:24.9316965Z env:
2025-12-04T09:16:24.9317165Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:16:24.9317417Z ##[endgroup]
2025-12-04T09:16:24.9389287Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:16:24.9390278Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:16:24.9405450Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:16:24.9405857Z env:
2025-12-04T09:16:24.9406089Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:16:24.9406415Z ##[endgroup]
2025-12-04T09:16:24.9520887Z ##[group]Run # Use all available CPUs for fetching
2025-12-04T09:16:24.9521330Z [36;1m# Use all available CPUs for fetching[0m
2025-12-04T09:16:24.9521672Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T09:16:24.9522005Z [36;1mgit config --global fetch.parallel 0[0m
2025-12-04T09:16:24.9522396Z [36;1mgit config --global submodule.fetchJobs 0[0m
2025-12-04T09:16:24.9522751Z [36;1m[0m
2025-12-04T09:16:24.9523104Z [36;1m# Clean workspace. The default checkout action should also do this, but[0m
2025-12-04T09:16:24.9523584Z [36;1m# do it here as well just in case[0m
2025-12-04T09:16:24.9523900Z [36;1mif [[ -d .git ]]; then[0m
2025-12-04T09:16:24.9524181Z [36;1m  if [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T09:16:24.9524489Z [36;1m    sudo git clean -ffdx[0m
2025-12-04T09:16:24.9524761Z [36;1m  else[0m
2025-12-04T09:16:24.9524980Z [36;1m    git clean -ffdx[0m
2025-12-04T09:16:24.9525232Z [36;1m  fi[0m
2025-12-04T09:16:24.9525434Z [36;1mfi[0m
2025-12-04T09:16:24.9534771Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:16:24.9535145Z env:
2025-12-04T09:16:24.9535408Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:16:24.9535676Z   NO_SUDO: true
2025-12-04T09:16:24.9535888Z ##[endgroup]
2025-12-04T09:16:24.9675994Z ##[group]Run actions/checkout@v4
2025-12-04T09:16:24.9676276Z with:
2025-12-04T09:16:24.9676536Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:16:24.9676874Z   fetch-depth: 0
2025-12-04T09:16:24.9677097Z   submodules: recursive
2025-12-04T09:16:24.9677343Z   show-progress: false
2025-12-04T09:16:24.9677605Z   repository: pytorch/pytorch
2025-12-04T09:16:24.9677979Z   token: ***
2025-12-04T09:16:24.9678191Z   ssh-strict: true
2025-12-04T09:16:24.9678420Z   ssh-user: git
2025-12-04T09:16:24.9678655Z   persist-credentials: true
2025-12-04T09:16:24.9678957Z   clean: true
2025-12-04T09:16:24.9679224Z   sparse-checkout-cone-mode: true
2025-12-04T09:16:24.9679518Z   fetch-tags: false
2025-12-04T09:16:24.9679742Z   lfs: false
2025-12-04T09:16:24.9679963Z   set-safe-directory: true
2025-12-04T09:16:24.9680236Z env:
2025-12-04T09:16:24.9680433Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:16:24.9680678Z ##[endgroup]
2025-12-04T09:16:25.0772186Z Syncing repository: pytorch/pytorch
2025-12-04T09:16:25.0773537Z ##[group]Getting Git version info
2025-12-04T09:16:25.0774008Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T09:16:25.0774683Z [command]/usr/bin/git version
2025-12-04T09:16:25.0973377Z git version 2.50.1
2025-12-04T09:16:25.0999017Z ##[endgroup]
2025-12-04T09:16:25.1009978Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/35b33208-1641-45ab-8ee2-11b904f686c5/.gitconfig'
2025-12-04T09:16:25.1075037Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/35b33208-1641-45ab-8ee2-11b904f686c5' before making global git config changes
2025-12-04T09:16:25.1076077Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T09:16:25.1080646Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:16:25.1137046Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T09:16:25.1140707Z ##[group]Initializing the repository
2025-12-04T09:16:25.1145212Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:16:25.1224863Z hint: Using 'master' as the name for the initial branch. This default branch name
2025-12-04T09:16:25.1225502Z hint: is subject to change. To configure the initial branch name to use in all
2025-12-04T09:16:25.1226076Z hint: of your new repositories, which will suppress this warning, call:
2025-12-04T09:16:25.1226491Z hint:
2025-12-04T09:16:25.1226783Z hint: 	git config --global init.defaultBranch <name>
2025-12-04T09:16:25.1227137Z hint:
2025-12-04T09:16:25.1227468Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
2025-12-04T09:16:25.1228055Z hint: 'development'. The just-created branch can be renamed via this command:
2025-12-04T09:16:25.1228490Z hint:
2025-12-04T09:16:25.1228700Z hint: 	git branch -m <name>
2025-12-04T09:16:25.1228952Z hint:
2025-12-04T09:16:25.1229323Z hint: Disable this message with "git config set advice.defaultBranchName false"
2025-12-04T09:16:25.1235361Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/
2025-12-04T09:16:25.1248141Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch
2025-12-04T09:16:25.1297716Z ##[endgroup]
2025-12-04T09:16:25.1298150Z ##[group]Disabling automatic garbage collection
2025-12-04T09:16:25.1301603Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T09:16:25.1335866Z ##[endgroup]
2025-12-04T09:16:25.1336252Z ##[group]Setting up auth
2025-12-04T09:16:25.1342391Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T09:16:25.1376940Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T09:16:25.1808527Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T09:16:25.1842660Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T09:16:25.2235948Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T09:16:25.2272370Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T09:16:25.2665003Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T09:16:25.2716400Z ##[endgroup]
2025-12-04T09:16:25.2717045Z ##[group]Fetching the repository
2025-12-04T09:16:25.2724372Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T09:17:17.9961311Z From https://github.com/pytorch/pytorch
2025-12-04T09:17:17.9962044Z  * [new branch]              2.6.0.dev20241004+          -> origin/2.6.0.dev20241004+
2025-12-04T09:17:17.9962754Z  * [new branch]              2.9.1                       -> origin/2.9.1
2025-12-04T09:17:17.9963348Z  * [new branch]              AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest
2025-12-04T09:17:17.9964279Z  * [new branch]              Flamefire-patch-1           -> origin/Flamefire-patch-1
2025-12-04T09:17:17.9964920Z  * [new branch]              HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes
2025-12-04T09:17:17.9966578Z  * [new branch]              HOPrintFunc                 -> origin/HOPrintFunc
2025-12-04T09:17:17.9970116Z  * [new branch]              IvanKobzarev/stack/1        -> origin/IvanKobzarev/stack/1
2025-12-04T09:17:17.9972944Z  * [new branch]              NicoshevSVE128              -> origin/NicoshevSVE128
2025-12-04T09:17:17.9975420Z  * [new branch]              PR-AOTInductorNoneBug       -> origin/PR-AOTInductorNoneBug
2025-12-04T09:17:17.9977085Z  * [new branch]              PR-AOTInductorNoneBugFix    -> origin/PR-AOTInductorNoneBugFix
2025-12-04T09:17:17.9978844Z  * [new branch]              PR-FixConfigsIssue          -> origin/PR-FixConfigsIssue
2025-12-04T09:17:17.9981013Z  * [new branch]              PR-NoneBugFix-viable        -> origin/PR-NoneBugFix-viable
2025-12-04T09:17:17.9982698Z  * [new branch]              PR-ResetToZero              -> origin/PR-ResetToZero
2025-12-04T09:17:17.9985027Z  * [new branch]              Update-Flash-Packaging      -> origin/Update-Flash-Packaging
2025-12-04T09:17:17.9986538Z  * [new branch]              VLA_exp                     -> origin/VLA_exp
2025-12-04T09:17:17.9988796Z  * [new branch]              activation_bench            -> origin/activation_bench
2025-12-04T09:17:17.9990618Z  * [new branch]              addmm-heuristic             -> origin/addmm-heuristic
2025-12-04T09:17:17.9993246Z  * [new branch]              adi/onednn_aarch64          -> origin/adi/onednn_aarch64
2025-12-04T09:17:17.9995032Z  * [new branch]              adi/test                    -> origin/adi/test
2025-12-04T09:17:17.9997038Z  * [new branch]              adi/test_bgemm              -> origin/adi/test_bgemm
2025-12-04T09:17:17.9998876Z  * [new branch]              adi/test_m8g                -> origin/adi/test_m8g
2025-12-04T09:17:18.0000668Z  * [new branch]              adi/test_onednn             -> origin/adi/test_onednn
2025-12-04T09:17:18.0002531Z  * [new branch]              adi/test_onednn_v3.9        -> origin/adi/test_onednn_v3.9
2025-12-04T09:17:18.0004279Z  * [new branch]              adi/test_presve_change      -> origin/adi/test_presve_change
2025-12-04T09:17:18.0006121Z  * [new branch]              adi/test_timm               -> origin/adi/test_timm
2025-12-04T09:17:18.0008724Z  * [new branch]              adi/testpresve_change       -> origin/adi/testpresve_change
2025-12-04T09:17:18.0014462Z  * [new branch]              aditew01/test/vec_bf16      -> origin/aditew01/test/vec_bf16
2025-12-04T09:17:18.0015154Z  * [new branch]              ah-globalfeedback-hook      -> origin/ah-globalfeedback-hook
2025-12-04T09:17:18.0017696Z  * [new branch]              albanD-patch-1              -> origin/albanD-patch-1
2025-12-04T09:17:18.0019714Z  * [new branch]              also-surround-shimh         -> origin/also-surround-shimh
2025-12-04T09:17:18.0022461Z  * [new branch]              angelayi/aot_compile        -> origin/angelayi/aot_compile
2025-12-04T09:17:18.0024424Z  * [new branch]              angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files
2025-12-04T09:17:18.0026268Z  * [new branch]              angelayi/benchmark          -> origin/angelayi/benchmark
2025-12-04T09:17:18.0028319Z  * [new branch]              angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization
2025-12-04T09:17:18.0029784Z  * [new branch]              angelayi/cpp_loader         -> origin/angelayi/cpp_loader
2025-12-04T09:17:18.0031837Z  * [new branch]              angelayi/inductor_const     -> origin/angelayi/inductor_const
2025-12-04T09:17:18.0033642Z  * [new branch]              angelayi/lstm               -> origin/angelayi/lstm
2025-12-04T09:17:18.0036286Z  * [new branch]              angelayi/no_so_weight       -> origin/angelayi/no_so_weight
2025-12-04T09:17:18.0038813Z  * [new branch]              angelayi/scan_layers        -> origin/angelayi/scan_layers
2025-12-04T09:17:18.0040735Z  * [new branch]              angelayi/side_eff           -> origin/angelayi/side_eff
2025-12-04T09:17:18.0042717Z  * [new branch]              angelayi/state_dict         -> origin/angelayi/state_dict
2025-12-04T09:17:18.0044844Z  * [new branch]              angelayi/symint_input       -> origin/angelayi/symint_input
2025-12-04T09:17:18.0046863Z  * [new branch]              angelayi/symm_mem           -> origin/angelayi/symm_mem
2025-12-04T09:17:18.0048697Z  * [new branch]              angelayi/test_cpp           -> origin/angelayi/test_cpp
2025-12-04T09:17:18.0051253Z  * [new branch]              angelayi/torch_size         -> origin/angelayi/torch_size
2025-12-04T09:17:18.0053137Z  * [new branch]              annotate_assert             -> origin/annotate_assert
2025-12-04T09:17:18.0055226Z  * [new branch]              annotate_fallback_kernel    -> origin/annotate_fallback_kernel
2025-12-04T09:17:18.0057292Z  * [new branch]              annotation_deepcopy         -> origin/annotation_deepcopy
2025-12-04T09:17:18.0059176Z  * [new branch]              annotation_dynamo           -> origin/annotation_dynamo
2025-12-04T09:17:18.0061115Z  * [new branch]              aot_eager_stack_trace       -> origin/aot_eager_stack_trace
2025-12-04T09:17:18.0062977Z  * [new branch]              aoti-cuda-alloc             -> origin/aoti-cuda-alloc
2025-12-04T09:17:18.0064879Z  * [new branch]              aoti_const_device           -> origin/aoti_const_device
2025-12-04T09:17:18.0066773Z  * [new branch]              aoti_fqn_name_interface     -> origin/aoti_fqn_name_interface
2025-12-04T09:17:18.0068626Z  * [new branch]              aoti_package_weights_binary -> origin/aoti_package_weights_binary
2025-12-04T09:17:18.0070435Z  * [new branch]              aoti_target_windows         -> origin/aoti_target_windows
2025-12-04T09:17:18.0073805Z  * [new branch]              arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling
2025-12-04T09:17:18.0075646Z  * [new branch]              async_tp                    -> origin/async_tp
2025-12-04T09:17:18.0077691Z  * [new branch]              atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124
2025-12-04T09:17:18.0079665Z  * [new branch]              atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1
2025-12-04T09:17:18.0081723Z  * [new branch]              atalman-patch-2             -> origin/atalman-patch-2
2025-12-04T09:17:18.0102889Z  * [new branch]              atalman-patch-3             -> origin/atalman-patch-3
2025-12-04T09:17:18.0103509Z  * [new branch]              atalman-patch-4             -> origin/atalman-patch-4
2025-12-04T09:17:18.0104069Z  * [new branch]              atalman-patch-5             -> origin/atalman-patch-5
2025-12-04T09:17:18.0104615Z  * [new branch]              atalman-patch-6             -> origin/atalman-patch-6
2025-12-04T09:17:18.0105175Z  * [new branch]              atalman-patch-7             -> origin/atalman-patch-7
2025-12-04T09:17:18.0105693Z  * [new branch]              atalman-patch-8             -> origin/atalman-patch-8
2025-12-04T09:17:18.0106332Z  * [new branch]              atalman_inductor_2.3.1      -> origin/atalman_inductor_2.3.1
2025-12-04T09:17:18.0107081Z  * [new branch]              atalman_inductor_2.4.0      -> origin/atalman_inductor_2.4.0
2025-12-04T09:17:18.0108063Z  * [new branch]              atalman_inductor_2.4.x      -> origin/atalman_inductor_2.4.x
2025-12-04T09:17:18.0108760Z  * [new branch]              attention_benchmarking_clean -> origin/attention_benchmarking_clean
2025-12-04T09:17:18.0109424Z  * [new branch]              bahuang/dt_fix_scalar_add   -> origin/bahuang/dt_fix_scalar_add
2025-12-04T09:17:18.0110181Z  * [new branch]              bahuang/fix_debug_mode      -> origin/bahuang/fix_debug_mode
2025-12-04T09:17:18.0110851Z  * [new branch]              bahuang/fix_expand          -> origin/bahuang/fix_expand
2025-12-04T09:17:18.0111366Z  * [new branch]              bahuang/test                -> origin/bahuang/test
2025-12-04T09:17:18.0112368Z  * [new branch]              base/1.5                    -> origin/base/1.5
2025-12-04T09:17:18.0114873Z  * [new branch]              batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention
2025-12-04T09:17:18.0116288Z  * [new branch]              bench_scaled_mm_ops         -> origin/bench_scaled_mm_ops
2025-12-04T09:17:18.0118746Z  * [new branch]              benchmark-updates           -> origin/benchmark-updates
2025-12-04T09:17:18.0120209Z  * [new branch]              benchmarking-script         -> origin/benchmarking-script
2025-12-04T09:17:18.0123113Z  * [new branch]              bertmaher/pinbump26         -> origin/bertmaher/pinbump26
2025-12-04T09:17:18.0125719Z  * [new branch]              bertrand/cutlass            -> origin/bertrand/cutlass
2025-12-04T09:17:18.0128257Z  * [new branch]              bf/bug-static-input         -> origin/bf/bug-static-input
2025-12-04T09:17:18.0129743Z  * [new branch]              bf/cg-backend               -> origin/bf/cg-backend
2025-12-04T09:17:18.0131732Z  * [new branch]              bf/cg-nccl-test             -> origin/bf/cg-nccl-test
2025-12-04T09:17:18.0133579Z  * [new branch]              bf/cg-remove-check          -> origin/bf/cg-remove-check
2025-12-04T09:17:18.0135574Z  * [new branch]              bf/clean-torchbench-hf      -> origin/bf/clean-torchbench-hf
2025-12-04T09:17:18.0137100Z  * [new branch]              bf/combo-debug-log          -> origin/bf/combo-debug-log
2025-12-04T09:17:18.0139089Z  * [new branch]              bf/cudagraph                -> origin/bf/cudagraph
2025-12-04T09:17:18.0141722Z  * [new branch]              bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation
2025-12-04T09:17:18.0143663Z  * [new branch]              bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark
2025-12-04T09:17:18.0145123Z  * [new branch]              bf/cudagraph-partition      -> origin/bf/cudagraph-partition
2025-12-04T09:17:18.0147367Z  * [new branch]              bf/donated-buffer-bench     -> origin/bf/donated-buffer-bench
2025-12-04T09:17:18.0149314Z  * [new branch]              bf/dynamo-partition         -> origin/bf/dynamo-partition
2025-12-04T09:17:18.0151154Z  * [new branch]              bf/lite                     -> origin/bf/lite
2025-12-04T09:17:18.0153076Z  * [new branch]              bf/pa-non-divisible         -> origin/bf/pa-non-divisible
2025-12-04T09:17:18.0155175Z  * [new branch]              bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols
2025-12-04T09:17:18.0157624Z  * [new branch]              bf/partition-memory-plan    -> origin/bf/partition-memory-plan
2025-12-04T09:17:18.0159492Z  * [new branch]              bf/partition-move-cpu       -> origin/bf/partition-move-cpu
2025-12-04T09:17:18.0161455Z  * [new branch]              bf/partition-view-fallback  -> origin/bf/partition-view-fallback
2025-12-04T09:17:18.0163077Z  * [new branch]              bf/remove-check-55b0c39d    -> origin/bf/remove-check-55b0c39d
2025-12-04T09:17:18.0165091Z  * [new branch]              bf/timm-nov-26-2025         -> origin/bf/timm-nov-26-2025
2025-12-04T09:17:18.0167038Z  * [new branch]              bf/transformer-pin-4-57-3   -> origin/bf/transformer-pin-4-57-3
2025-12-04T09:17:18.0168975Z  * [new branch]              bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492
2025-12-04T09:17:18.0170613Z  * [new branch]              bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb
2025-12-04T09:17:18.0172529Z  * [new branch]              bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129
2025-12-04T09:17:18.0174448Z  * [new branch]              bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d
2025-12-04T09:17:18.0176282Z  * [new branch]              bisect_perf_hf_T5_5268754e  -> origin/bisect_perf_hf_T5_5268754e
2025-12-04T09:17:18.0178201Z  * [new branch]              bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c
2025-12-04T09:17:18.0179855Z  * [new branch]              bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c
2025-12-04T09:17:18.0181857Z  * [new branch]              bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f
2025-12-04T09:17:18.0183825Z  * [new branch]              bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0
2025-12-04T09:17:18.0186045Z  * [new branch]              bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149
2025-12-04T09:17:18.0187531Z  * [new branch]              bisect_perf_hf_T5_d65f194a  -> origin/bisect_perf_hf_T5_d65f194a
2025-12-04T09:17:18.0189506Z  * [new branch]              bisect_perf_hf_T5_da94ab0b  -> origin/bisect_perf_hf_T5_da94ab0b
2025-12-04T09:17:18.0191257Z  * [new branch]              bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new
2025-12-04T09:17:18.0193161Z  * [new branch]              bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8
2025-12-04T09:17:18.0194907Z  * [new branch]              bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2
2025-12-04T09:17:18.0196837Z  * [new branch]              bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563
2025-12-04T09:17:18.0199400Z  * [new branch]              brister/fx_device_type      -> origin/brister/fx_device_type
2025-12-04T09:17:18.0201262Z  * [new branch]              brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx
2025-12-04T09:17:18.0202978Z  * [new branch]              brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check
2025-12-04T09:17:18.0204858Z  * [new branch]              bwd-backup                  -> origin/bwd-backup
2025-12-04T09:17:18.0206898Z  * [new branch]              c57382a49                   -> origin/c57382a49
2025-12-04T09:17:18.0208868Z  * [new branch]              ca_0431d47eaa               -> origin/ca_0431d47eaa
2025-12-04T09:17:18.0210859Z  * [new branch]              ca_fix_0431d47eaa           -> origin/ca_fix_0431d47eaa
2025-12-04T09:17:18.0213550Z  * [new branch]              camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push
2025-12-04T09:17:18.0215478Z  * [new branch]              cccclai-patch-1             -> origin/cccclai-patch-1
2025-12-04T09:17:18.0217576Z  * [new branch]              cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_
2025-12-04T09:17:18.0219399Z  * [new branch]              cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_
2025-12-04T09:17:18.0221944Z  * [new branch]              cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_
2025-12-04T09:17:18.0223633Z  * [new branch]              cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_
2025-12-04T09:17:18.0225789Z  * [new branch]              cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_
2025-12-04T09:17:18.0227911Z  * [new branch]              cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_
2025-12-04T09:17:18.0229573Z  * [new branch]              cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_
2025-12-04T09:17:18.0231671Z  * [new branch]              cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_
2025-12-04T09:17:18.0233753Z  * [new branch]              cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_
2025-12-04T09:17:18.0235521Z  * [new branch]              cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_
2025-12-04T09:17:18.0237659Z  * [new branch]              cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_
2025-12-04T09:17:18.0239366Z  * [new branch]              cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_
2025-12-04T09:17:18.0241174Z  * [new branch]              cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_
2025-12-04T09:17:18.0243359Z  * [new branch]              cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_
2025-12-04T09:17:18.0245304Z  * [new branch]              cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_
2025-12-04T09:17:18.0247025Z  * [new branch]              cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_
2025-12-04T09:17:18.0249198Z  * [new branch]              cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_
2025-12-04T09:17:18.0251021Z  * [new branch]              cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_
2025-12-04T09:17:18.0253118Z  * [new branch]              cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_
2025-12-04T09:17:18.0254715Z  * [new branch]              cherry_pick_166036_166040   -> origin/cherry_pick_166036_166040
2025-12-04T09:17:18.0256802Z  * [new branch]              cherry_pick_166457          -> origin/cherry_pick_166457
2025-12-04T09:17:18.0258756Z  * [new branch]              cherrypick_166338           -> origin/cherrypick_166338
2025-12-04T09:17:18.0260887Z  * [new branch]              cherrypick_166458           -> origin/cherrypick_166458
2025-12-04T09:17:18.0262435Z  * [new branch]              cherrypick_166586           -> origin/cherrypick_166586
2025-12-04T09:17:18.0264472Z  * [new branch]              cherrypick_166956           -> origin/cherrypick_166956
2025-12-04T09:17:18.0266386Z  * [new branch]              ci_attn                     -> origin/ci_attn
2025-12-04T09:17:18.0268283Z  * [new branch]              codex-testing               -> origin/codex-testing
2025-12-04T09:17:18.0271273Z  * [new branch]              codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions
2025-12-04T09:17:18.0272604Z  * [new branch]              codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch
2025-12-04T09:17:18.0275320Z  * [new branch]              codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id
2025-12-04T09:17:18.0277326Z  * [new branch]              codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run
2025-12-04T09:17:18.0278836Z  * [new branch]              compatiblpy39util           -> origin/compatiblpy39util
2025-12-04T09:17:18.0280888Z  * [new branch]              cond_hop_device             -> origin/cond_hop_device
2025-12-04T09:17:18.0282796Z  * [new branch]              context_test                -> origin/context_test
2025-12-04T09:17:18.0285609Z  * [new branch]              copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip
2025-12-04T09:17:18.0287993Z  * [new branch]              cpio/fix_new_ami_tests      -> origin/cpio/fix_new_ami_tests
2025-12-04T09:17:18.0289978Z  * [new branch]              cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade
2025-12-04T09:17:18.0292658Z  * [new branch]              crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering
2025-12-04T09:17:18.0295087Z  * [new branch]              csl/always_produce_xml      -> origin/csl/always_produce_xml
2025-12-04T09:17:18.0296630Z  * [new branch]              csl/build_test_more_procs   -> origin/csl/build_test_more_procs
2025-12-04T09:17:18.0298568Z  * [new branch]              csl/build_test_more_procs2  -> origin/csl/build_test_more_procs2
2025-12-04T09:17:18.0300523Z  * [new branch]              csl/clean_up                -> origin/csl/clean_up
2025-12-04T09:17:18.0302372Z  * [new branch]              csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit
2025-12-04T09:17:18.0303869Z  * [new branch]              csl/katex                   -> origin/csl/katex
2025-12-04T09:17:18.0306138Z  * [new branch]              csl/larger_runner           -> origin/csl/larger_runner
2025-12-04T09:17:18.0308546Z  * [new branch]              csl/lint_testing            -> origin/csl/lint_testing
2025-12-04T09:17:18.0310839Z  * [new branch]              csl/lint_thing              -> origin/csl/lint_thing
2025-12-04T09:17:18.0313014Z  * [new branch]              csl/lintrunner_stuff        -> origin/csl/lintrunner_stuff
2025-12-04T09:17:18.0314557Z  * [new branch]              csl/manually_gen_json       -> origin/csl/manually_gen_json
2025-12-04T09:17:18.0316563Z  * [new branch]              csl/mps_sharding            -> origin/csl/mps_sharding
2025-12-04T09:17:18.0318277Z  * [new branch]              csl/multistage_docker       -> origin/csl/multistage_docker
2025-12-04T09:17:18.0320340Z  * [new branch]              csl/print_timing            -> origin/csl/print_timing
2025-12-04T09:17:18.0322250Z  * [new branch]              csl/remove_experiment       -> origin/csl/remove_experiment
2025-12-04T09:17:18.0324211Z  * [new branch]              csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var
2025-12-04T09:17:18.0326258Z  * [new branch]              csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel
2025-12-04T09:17:18.0327872Z  * [new branch]              csl/remove_run_parallel     -> origin/csl/remove_run_parallel
2025-12-04T09:17:18.0329758Z  * [new branch]              csl/remove_unused_vars      -> origin/csl/remove_unused_vars
2025-12-04T09:17:18.0331632Z  * [new branch]              csl/revert_open             -> origin/csl/revert_open
2025-12-04T09:17:18.0333480Z  * [new branch]              csl/skip_build              -> origin/csl/skip_build
2025-12-04T09:17:18.0335393Z  * [new branch]              csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs
2025-12-04T09:17:18.0337172Z  * [new branch]              csl/td_job_level            -> origin/csl/td_job_level
2025-12-04T09:17:18.0339207Z  * [new branch]              csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner
2025-12-04T09:17:18.0341324Z  * [new branch]              csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn
2025-12-04T09:17:18.0342879Z  * [new branch]              csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence
2025-12-04T09:17:18.0345203Z  * [new branch]              csl/upload_json_running     -> origin/csl/upload_json_running
2025-12-04T09:17:18.0346478Z  * [new branch]              csl/win_sccache             -> origin/csl/win_sccache
2025-12-04T09:17:18.0348443Z  * [new branch]              csl/xml_stuff               -> origin/csl/xml_stuff
2025-12-04T09:17:18.0350401Z  * [new branch]              cublasrelax2                -> origin/cublasrelax2
2025-12-04T09:17:18.0352322Z  * [new branch]              cuda_mempool                -> origin/cuda_mempool
2025-12-04T09:17:18.0354168Z  * [new branch]              custom_lowering_dict        -> origin/custom_lowering_dict
2025-12-04T09:17:18.0356750Z  * [new branch]              d4l3k/debug_plane_frtrace   -> origin/d4l3k/debug_plane_frtrace
2025-12-04T09:17:18.0359332Z  * [new branch]              daxia6/2.8o3                -> origin/daxia6/2.8o3
2025-12-04T09:17:18.0361254Z  * [new branch]              debug-guard                 -> origin/debug-guard
2025-12-04T09:17:18.0363208Z  * [new branch]              delete-quant-docs           -> origin/delete-quant-docs
2025-12-04T09:17:18.0369368Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0
2025-12-04T09:17:18.0371240Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1
2025-12-04T09:17:18.0373517Z  * [new branch]              desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper
2025-12-04T09:17:18.0375212Z  * [new branch]              desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64
2025-12-04T09:17:18.0378360Z  * [new branch]              dev/dhruva/flex_attn_opt    -> origin/dev/dhruva/flex_attn_opt
2025-12-04T09:17:18.0381533Z  * [new branch]              dev/joona/MPSNDArrayAdd     -> origin/dev/joona/MPSNDArrayAdd
2025-12-04T09:17:18.0383285Z  * [new branch]              dev/joona/Unranked          -> origin/dev/joona/Unranked
2025-12-04T09:17:18.0385454Z  * [new branch]              dev/joona/cat               -> origin/dev/joona/cat
2025-12-04T09:17:18.0387323Z  * [new branch]              dev/joona/embeddingbag      -> origin/dev/joona/embeddingbag
2025-12-04T09:17:18.0389068Z  * [new branch]              dev/joona/fix_sdpa_memtest  -> origin/dev/joona/fix_sdpa_memtest
2025-12-04T09:17:18.0391341Z  * [new branch]              dev/joona/getTensorsString  -> origin/dev/joona/getTensorsString
2025-12-04T09:17:18.0393436Z  * [new branch]              dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14
2025-12-04T09:17:18.0395858Z  * [new branch]              dev/joona/scalar_clamp      -> origin/dev/joona/scalar_clamp
2025-12-04T09:17:18.0398198Z  * [new branch]              dev/joona/sdpa              -> origin/dev/joona/sdpa
2025-12-04T09:17:18.0400859Z  * [new branch]              dev/joona/sdpa_api          -> origin/dev/joona/sdpa_api
2025-12-04T09:17:18.0402931Z  * [new branch]              dev/joona/type_inf          -> origin/dev/joona/type_inf
2025-12-04T09:17:18.0405210Z  * [new branch]              dev/joona/ulpAssertClose    -> origin/dev/joona/ulpAssertClose
2025-12-04T09:17:18.0407036Z  * [new branch]              dev/joona/upsize3d          -> origin/dev/joona/upsize3d
2025-12-04T09:17:18.0408804Z  * [new branch]              disp_counter                -> origin/disp_counter
2025-12-04T09:17:18.0411177Z  * [new branch]              divyanshk-patch-1           -> origin/divyanshk-patch-1
2025-12-04T09:17:18.0412952Z  * [new branch]              docs                        -> origin/docs
2025-12-04T09:17:18.0414898Z  * [new branch]              documentation               -> origin/documentation
2025-12-04T09:17:18.0416754Z  * [new branch]              eager_model_benchmarks      -> origin/eager_model_benchmarks
2025-12-04T09:17:18.0419884Z  * [new branch]              embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control
2025-12-04T09:17:18.0421437Z  * [new branch]              embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B
2025-12-04T09:17:18.0423040Z  * [new branch]              embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B
2025-12-04T09:17:18.0425137Z  * [new branch]              eqy-patch-1                 -> origin/eqy-patch-1
2025-12-04T09:17:18.0427083Z  * [new branch]              eqy-patch-2                 -> origin/eqy-patch-2
2025-12-04T09:17:18.0428991Z  * [new branch]              eqy-patch-3                 -> origin/eqy-patch-3
2025-12-04T09:17:18.0430961Z  * [new branch]              eqy-patch-4                 -> origin/eqy-patch-4
2025-12-04T09:17:18.0432821Z  * [new branch]              eqy-patch-5                 -> origin/eqy-patch-5
2025-12-04T09:17:18.0434673Z  * [new branch]              eqy-patch-6                 -> origin/eqy-patch-6
2025-12-04T09:17:18.0437188Z  * [new branch]              exclamaforte/amd-ma         -> origin/exclamaforte/amd-ma
2025-12-04T09:17:18.0439259Z  * [new branch]              exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run
2025-12-04T09:17:18.0440609Z  * [new branch]              exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor
2025-12-04T09:17:18.0442679Z  * [new branch]              exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion
2025-12-04T09:17:18.0444452Z  * [new branch]              exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning
2025-12-04T09:17:18.0446801Z  * [new branch]              exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg
2025-12-04T09:17:18.0449188Z  * [new branch]              exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run
2025-12-04T09:17:18.0450633Z  * [new branch]              exclamaforte/fusion-data    -> origin/exclamaforte/fusion-data
2025-12-04T09:17:18.0453088Z  * [new branch]              exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run
2025-12-04T09:17:18.0454551Z  * [new branch]              exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model
2025-12-04T09:17:18.0456359Z  * [new branch]              exclamaforte/gemm-model     -> origin/exclamaforte/gemm-model
2025-12-04T09:17:18.0458671Z  * [new branch]              exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection
2025-12-04T09:17:18.0460302Z  * [new branch]              exclamaforte/gemm-to-amd    -> origin/exclamaforte/gemm-to-amd
2025-12-04T09:17:18.0462314Z  * [new branch]              exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model
2025-12-04T09:17:18.0464426Z  * [new branch]              exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor
2025-12-04T09:17:18.0466041Z  * [new branch]              exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo
2025-12-04T09:17:18.0468512Z  * [new branch]              exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization
2025-12-04T09:17:18.0470233Z  * [new branch]              exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode
2025-12-04T09:17:18.0472525Z  * [new branch]              exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs
2025-12-04T09:17:18.0474627Z  * [new branch]              exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2
2025-12-04T09:17:18.0476154Z  * [new branch]              exec                        -> origin/exec
2025-12-04T09:17:18.0478390Z  * [new branch]              experimental-mosaic         -> origin/experimental-mosaic
2025-12-04T09:17:18.0480383Z  * [new branch]              export-D61047529            -> origin/export-D61047529
2025-12-04T09:17:18.0482291Z  * [new branch]              export-D71412006            -> origin/export-D71412006
2025-12-04T09:17:18.0484289Z  * [new branch]              export-D73042989            -> origin/export-D73042989
2025-12-04T09:17:18.0486126Z  * [new branch]              export-D78957093            -> origin/export-D78957093
2025-12-04T09:17:18.0487950Z  * [new branch]              export-D78996107            -> origin/export-D78996107
2025-12-04T09:17:18.0489864Z  * [new branch]              export-D80823877            -> origin/export-D80823877
2025-12-04T09:17:18.0491824Z  * [new branch]              export-D80958642            -> origin/export-D80958642
2025-12-04T09:17:18.0493699Z  * [new branch]              export-D81054193            -> origin/export-D81054193
2025-12-04T09:17:18.0495514Z  * [new branch]              export-D81204584            -> origin/export-D81204584
2025-12-04T09:17:18.0497352Z  * [new branch]              export-D81429090            -> origin/export-D81429090
2025-12-04T09:17:18.0499503Z  * [new branch]              export-D82250826            -> origin/export-D82250826
2025-12-04T09:17:18.0501439Z  * [new branch]              export-D82253817            -> origin/export-D82253817
2025-12-04T09:17:18.0503393Z  * [new branch]              export-D83541846            -> origin/export-D83541846
2025-12-04T09:17:18.0505345Z  * [new branch]              export-D83627170            -> origin/export-D83627170
2025-12-04T09:17:18.0507246Z  * [new branch]              export-D83766701            -> origin/export-D83766701
2025-12-04T09:17:18.0509384Z  * [new branch]              export-D83768878            -> origin/export-D83768878
2025-12-04T09:17:18.0511246Z  * [new branch]              export-D83769447            -> origin/export-D83769447
2025-12-04T09:17:18.0513078Z  * [new branch]              export-D84089824            -> origin/export-D84089824
2025-12-04T09:17:18.0514967Z  * [new branch]              export-D84213020            -> origin/export-D84213020
2025-12-04T09:17:18.0517345Z  * [new branch]              export-D84373821            -> origin/export-D84373821
2025-12-04T09:17:18.0519610Z  * [new branch]              export-D84612194            -> origin/export-D84612194
2025-12-04T09:17:18.0521372Z  * [new branch]              export-D84890985            -> origin/export-D84890985
2025-12-04T09:17:18.0523301Z  * [new branch]              export-D85122326            -> origin/export-D85122326
2025-12-04T09:17:18.0525336Z  * [new branch]              export-D86256198            -> origin/export-D86256198
2025-12-04T09:17:18.0527165Z  * [new branch]              export-D86460608            -> origin/export-D86460608
2025-12-04T09:17:18.0529165Z  * [new branch]              export-D86474796            -> origin/export-D86474796
2025-12-04T09:17:18.0531216Z  * [new branch]              export-D86712396            -> origin/export-D86712396
2025-12-04T09:17:18.0533102Z  * [new branch]              export-D87022129            -> origin/export-D87022129
2025-12-04T09:17:18.0535048Z  * [new branch]              export-D87838959            -> origin/export-D87838959
2025-12-04T09:17:18.0537000Z  * [new branch]              export-D88319437            -> origin/export-D88319437
2025-12-04T09:17:18.0539216Z  * [new branch]              exported-model-train-idempotent -> origin/exported-model-train-idempotent
2025-12-04T09:17:18.0541164Z  * [new branch]              ezyang-titan-october        -> origin/ezyang-titan-october
2025-12-04T09:17:18.0542772Z  * [new branch]              ezyang-titan-october2       -> origin/ezyang-titan-october2
2025-12-04T09:17:18.0544696Z  * [new branch]              ezyang-war                  -> origin/ezyang-war
2025-12-04T09:17:18.0547144Z  * [new branch]              ezyang/wip-aot-descriptors  -> origin/ezyang/wip-aot-descriptors
2025-12-04T09:17:18.0549285Z  * [new branch]              fa_u8_brgemm                -> origin/fa_u8_brgemm
2025-12-04T09:17:18.0551908Z  * [new branch]              fadeputr/sequence_fbgemm    -> origin/fadeputr/sequence_fbgemm
2025-12-04T09:17:18.0553738Z  * [new branch]              fastmath_baseline           -> origin/fastmath_baseline
2025-12-04T09:17:18.0556323Z  * [new branch]              fbcode/warm                 -> origin/fbcode/warm
2025-12-04T09:17:18.0558262Z  * [new branch]              fca                         -> origin/fca
2025-12-04T09:17:18.0560174Z  * [new branch]              fca2_ca5984c                -> origin/fca2_ca5984c
2025-12-04T09:17:18.0561960Z  * [new branch]              fca5                        -> origin/fca5
2025-12-04T09:17:18.0564534Z  * [new branch]              feature/justknobs-cpp       -> origin/feature/justknobs-cpp
2025-12-04T09:17:18.0566461Z  * [new branch]              feature/numa-forkserver     -> origin/feature/numa-forkserver
2025-12-04T09:17:18.0568704Z  * [new branch]              ffast_math_baseline         -> origin/ffast_math_baseline
2025-12-04T09:17:18.0570545Z  * [new branch]              ffast_math_target           -> origin/ffast_math_target
2025-12-04T09:17:18.0573126Z  * [new branch]              findhao/base_commit         -> origin/findhao/base_commit
2025-12-04T09:17:18.0574970Z  * [new branch]              findhao/base_commit1        -> origin/findhao/base_commit1
2025-12-04T09:17:18.0576786Z  * [new branch]              findhao/multistream2        -> origin/findhao/multistream2
2025-12-04T09:17:18.0578286Z  * [new branch]              findhao/multistream5        -> origin/findhao/multistream5
2025-12-04T09:17:18.0580389Z  * [new branch]              findhao/multistream6        -> origin/findhao/multistream6
2025-12-04T09:17:18.0582364Z  * [new branch]              findhao/operatorbench3      -> origin/findhao/operatorbench3
2025-12-04T09:17:18.0583783Z  * [new branch]              findhao/operatorbench5      -> origin/findhao/operatorbench5
2025-12-04T09:17:18.0585682Z  * [new branch]              findhao/tritonparse         -> origin/findhao/tritonparse
2025-12-04T09:17:18.0587699Z  * [new branch]              fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format
2025-12-04T09:17:18.0589713Z  * [new branch]              fix-config-ignore           -> origin/fix-config-ignore
2025-12-04T09:17:18.0591230Z  * [new branch]              fix-dict-guard              -> origin/fix-dict-guard
2025-12-04T09:17:18.0593397Z  * [new branch]              fix_addmm_issue             -> origin/fix_addmm_issue
2025-12-04T09:17:18.0595376Z  * [new branch]              fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims
2025-12-04T09:17:18.0596992Z  * [new branch]              fix_bench_bwd_pass          -> origin/fix_bench_bwd_pass
2025-12-04T09:17:18.0599034Z  * [new branch]              fix_mem_profiler_config     -> origin/fix_mem_profiler_config
2025-12-04T09:17:18.0600834Z  * [new branch]              fix_nvrtc_discovery         -> origin/fix_nvrtc_discovery
2025-12-04T09:17:18.0602672Z  * [new branch]              fix_op_runner               -> origin/fix_op_runner
2025-12-04T09:17:18.0604536Z  * [new branch]              fix_ubn_159469              -> origin/fix_ubn_159469
2025-12-04T09:17:18.0606482Z  * [new branch]              fixes-triage                -> origin/fixes-triage
2025-12-04T09:17:18.0608882Z  * [new branch]              fixflashinfer               -> origin/fixflashinfer
2025-12-04T09:17:18.0610714Z  * [new branch]              flash_decoding_cpu          -> origin/flash_decoding_cpu
2025-12-04T09:17:18.0612503Z  * [new branch]              flex-flash                  -> origin/flex-flash
2025-12-04T09:17:18.0614539Z  * [new branch]              flex_attention_functorch_grad -> origin/flex_attention_functorch_grad
2025-12-04T09:17:18.0616358Z  * [new branch]              flex_flash                  -> origin/flex_flash
2025-12-04T09:17:18.0619124Z  * [new branch]              fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule
2025-12-04T09:17:18.0620809Z  * [new branch]              fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler
2025-12-04T09:17:18.0622682Z  * [new branch]              forkserver_fix              -> origin/forkserver_fix
2025-12-04T09:17:18.0624691Z  * [new branch]              fsdp2_trace_rules           -> origin/fsdp2_trace_rules
2025-12-04T09:17:18.0626719Z  * [new branch]              fx_cpp                      -> origin/fx_cpp
2025-12-04T09:17:18.0629237Z  * [new branch]              fy/fix-win                  -> origin/fy/fix-win
2025-12-04T09:17:18.0631245Z  * [new branch]              galv-patch-1                -> origin/galv-patch-1
2025-12-04T09:17:18.0634060Z  * [new branch]              galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4
2025-12-04T09:17:18.0636544Z  * [new branch]              georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch
2025-12-04T09:17:18.0640321Z  * [new branch]              gh/AlnisM/1/base            -> origin/gh/AlnisM/1/base
2025-12-04T09:17:18.0642213Z  * [new branch]              gh/AlnisM/1/head            -> origin/gh/AlnisM/1/head
2025-12-04T09:17:18.0645386Z  * [new branch]              gh/EikanWang/67/base        -> origin/gh/EikanWang/67/base
2025-12-04T09:17:18.0647219Z  * [new branch]              gh/EikanWang/67/head        -> origin/gh/EikanWang/67/head
2025-12-04T09:17:18.0650669Z  * [new branch]              gh/Gasoonjia/1/base         -> origin/gh/Gasoonjia/1/base
2025-12-04T09:17:18.0652512Z  * [new branch]              gh/Gasoonjia/1/head         -> origin/gh/Gasoonjia/1/head
2025-12-04T09:17:18.0655615Z  * [new branch]              gh/H-Huang/131/base         -> origin/gh/H-Huang/131/base
2025-12-04T09:17:18.0657414Z  * [new branch]              gh/H-Huang/131/head         -> origin/gh/H-Huang/131/head
2025-12-04T09:17:18.0659481Z  * [new branch]              gh/H-Huang/131/orig         -> origin/gh/H-Huang/131/orig
2025-12-04T09:17:18.0661983Z  * [new branch]              gh/H-Huang/132/base         -> origin/gh/H-Huang/132/base
2025-12-04T09:17:18.0663755Z  * [new branch]              gh/H-Huang/132/head         -> origin/gh/H-Huang/132/head
2025-12-04T09:17:18.0665620Z  * [new branch]              gh/H-Huang/132/orig         -> origin/gh/H-Huang/132/orig
2025-12-04T09:17:18.0668441Z  * [new branch]              gh/H-Huang/180/base         -> origin/gh/H-Huang/180/base
2025-12-04T09:17:18.0669815Z  * [new branch]              gh/H-Huang/180/head         -> origin/gh/H-Huang/180/head
2025-12-04T09:17:18.0671756Z  * [new branch]              gh/H-Huang/180/orig         -> origin/gh/H-Huang/180/orig
2025-12-04T09:17:18.0674258Z  * [new branch]              gh/H-Huang/182/base         -> origin/gh/H-Huang/182/base
2025-12-04T09:17:18.0676105Z  * [new branch]              gh/H-Huang/182/head         -> origin/gh/H-Huang/182/head
2025-12-04T09:17:18.0677915Z  * [new branch]              gh/H-Huang/182/orig         -> origin/gh/H-Huang/182/orig
2025-12-04T09:17:18.0680600Z  * [new branch]              gh/H-Huang/226/base         -> origin/gh/H-Huang/226/base
2025-12-04T09:17:18.0682435Z  * [new branch]              gh/H-Huang/226/head         -> origin/gh/H-Huang/226/head
2025-12-04T09:17:18.0684263Z  * [new branch]              gh/H-Huang/226/orig         -> origin/gh/H-Huang/226/orig
2025-12-04T09:17:18.0687325Z  * [new branch]              gh/H-Huang/228/base         -> origin/gh/H-Huang/228/base
2025-12-04T09:17:18.0689179Z  * [new branch]              gh/H-Huang/228/head         -> origin/gh/H-Huang/228/head
2025-12-04T09:17:18.0691014Z  * [new branch]              gh/H-Huang/228/orig         -> origin/gh/H-Huang/228/orig
2025-12-04T09:17:18.0694204Z  * [new branch]              gh/IvanKobzarev/150/base    -> origin/gh/IvanKobzarev/150/base
2025-12-04T09:17:18.0695778Z  * [new branch]              gh/IvanKobzarev/150/head    -> origin/gh/IvanKobzarev/150/head
2025-12-04T09:17:18.0697764Z  * [new branch]              gh/IvanKobzarev/150/orig    -> origin/gh/IvanKobzarev/150/orig
2025-12-04T09:17:18.0700579Z  * [new branch]              gh/IvanKobzarev/157/base    -> origin/gh/IvanKobzarev/157/base
2025-12-04T09:17:18.0702441Z  * [new branch]              gh/IvanKobzarev/157/head    -> origin/gh/IvanKobzarev/157/head
2025-12-04T09:17:18.0704589Z  * [new branch]              gh/IvanKobzarev/157/orig    -> origin/gh/IvanKobzarev/157/orig
2025-12-04T09:17:18.0706953Z  * [new branch]              gh/IvanKobzarev/159/base    -> origin/gh/IvanKobzarev/159/base
2025-12-04T09:17:18.0709029Z  * [new branch]              gh/IvanKobzarev/159/head    -> origin/gh/IvanKobzarev/159/head
2025-12-04T09:17:18.0710955Z  * [new branch]              gh/IvanKobzarev/159/orig    -> origin/gh/IvanKobzarev/159/orig
2025-12-04T09:17:18.0713479Z  * [new branch]              gh/IvanKobzarev/162/base    -> origin/gh/IvanKobzarev/162/base
2025-12-04T09:17:18.0715436Z  * [new branch]              gh/IvanKobzarev/162/head    -> origin/gh/IvanKobzarev/162/head
2025-12-04T09:17:18.0717030Z  * [new branch]              gh/IvanKobzarev/162/orig    -> origin/gh/IvanKobzarev/162/orig
2025-12-04T09:17:18.0719814Z  * [new branch]              gh/IvanKobzarev/163/base    -> origin/gh/IvanKobzarev/163/base
2025-12-04T09:17:18.0721675Z  * [new branch]              gh/IvanKobzarev/163/head    -> origin/gh/IvanKobzarev/163/head
2025-12-04T09:17:18.0723841Z  * [new branch]              gh/IvanKobzarev/163/orig    -> origin/gh/IvanKobzarev/163/orig
2025-12-04T09:17:18.0726509Z  * [new branch]              gh/IvanKobzarev/166/base    -> origin/gh/IvanKobzarev/166/base
2025-12-04T09:17:18.0728161Z  * [new branch]              gh/IvanKobzarev/166/head    -> origin/gh/IvanKobzarev/166/head
2025-12-04T09:17:18.0730113Z  * [new branch]              gh/IvanKobzarev/166/orig    -> origin/gh/IvanKobzarev/166/orig
2025-12-04T09:17:18.0732750Z  * [new branch]              gh/IvanKobzarev/167/base    -> origin/gh/IvanKobzarev/167/base
2025-12-04T09:17:18.0734276Z  * [new branch]              gh/IvanKobzarev/167/head    -> origin/gh/IvanKobzarev/167/head
2025-12-04T09:17:18.0736253Z  * [new branch]              gh/IvanKobzarev/167/orig    -> origin/gh/IvanKobzarev/167/orig
2025-12-04T09:17:18.0738816Z  * [new branch]              gh/IvanKobzarev/168/base    -> origin/gh/IvanKobzarev/168/base
2025-12-04T09:17:18.0741027Z  * [new branch]              gh/IvanKobzarev/168/head    -> origin/gh/IvanKobzarev/168/head
2025-12-04T09:17:18.0742439Z  * [new branch]              gh/IvanKobzarev/168/orig    -> origin/gh/IvanKobzarev/168/orig
2025-12-04T09:17:18.0745074Z  * [new branch]              gh/IvanKobzarev/169/base    -> origin/gh/IvanKobzarev/169/base
2025-12-04T09:17:18.0746750Z  * [new branch]              gh/IvanKobzarev/169/head    -> origin/gh/IvanKobzarev/169/head
2025-12-04T09:17:18.0748681Z  * [new branch]              gh/IvanKobzarev/169/orig    -> origin/gh/IvanKobzarev/169/orig
2025-12-04T09:17:18.0751265Z  * [new branch]              gh/IvanKobzarev/170/base    -> origin/gh/IvanKobzarev/170/base
2025-12-04T09:17:18.0752794Z  * [new branch]              gh/IvanKobzarev/170/head    -> origin/gh/IvanKobzarev/170/head
2025-12-04T09:17:18.0754704Z  * [new branch]              gh/IvanKobzarev/170/orig    -> origin/gh/IvanKobzarev/170/orig
2025-12-04T09:17:18.0757513Z  * [new branch]              gh/IvanKobzarev/171/base    -> origin/gh/IvanKobzarev/171/base
2025-12-04T09:17:18.0759128Z  * [new branch]              gh/IvanKobzarev/171/head    -> origin/gh/IvanKobzarev/171/head
2025-12-04T09:17:18.0761127Z  * [new branch]              gh/IvanKobzarev/171/orig    -> origin/gh/IvanKobzarev/171/orig
2025-12-04T09:17:18.0763931Z  * [new branch]              gh/IvanKobzarev/172/base    -> origin/gh/IvanKobzarev/172/base
2025-12-04T09:17:18.0765824Z  * [new branch]              gh/IvanKobzarev/172/head    -> origin/gh/IvanKobzarev/172/head
2025-12-04T09:17:18.0767438Z  * [new branch]              gh/IvanKobzarev/172/orig    -> origin/gh/IvanKobzarev/172/orig
2025-12-04T09:17:18.0770211Z  * [new branch]              gh/IvanKobzarev/173/base    -> origin/gh/IvanKobzarev/173/base
2025-12-04T09:17:18.0772103Z  * [new branch]              gh/IvanKobzarev/173/head    -> origin/gh/IvanKobzarev/173/head
2025-12-04T09:17:18.0773706Z  * [new branch]              gh/IvanKobzarev/173/orig    -> origin/gh/IvanKobzarev/173/orig
2025-12-04T09:17:18.0776371Z  * [new branch]              gh/IvanKobzarev/174/base    -> origin/gh/IvanKobzarev/174/base
2025-12-04T09:17:18.0778334Z  * [new branch]              gh/IvanKobzarev/174/head    -> origin/gh/IvanKobzarev/174/head
2025-12-04T09:17:18.0780275Z  * [new branch]              gh/IvanKobzarev/174/orig    -> origin/gh/IvanKobzarev/174/orig
2025-12-04T09:17:18.0782814Z  * [new branch]              gh/IvanKobzarev/175/base    -> origin/gh/IvanKobzarev/175/base
2025-12-04T09:17:18.0784682Z  * [new branch]              gh/IvanKobzarev/175/head    -> origin/gh/IvanKobzarev/175/head
2025-12-04T09:17:18.0786913Z  * [new branch]              gh/IvanKobzarev/175/orig    -> origin/gh/IvanKobzarev/175/orig
2025-12-04T09:17:18.0790151Z  * [new branch]              gh/IvanKobzarev/176/base    -> origin/gh/IvanKobzarev/176/base
2025-12-04T09:17:18.0792055Z  * [new branch]              gh/IvanKobzarev/176/head    -> origin/gh/IvanKobzarev/176/head
2025-12-04T09:17:18.0793640Z  * [new branch]              gh/IvanKobzarev/176/orig    -> origin/gh/IvanKobzarev/176/orig
2025-12-04T09:17:18.0796676Z  * [new branch]              gh/IvanKobzarev/177/base    -> origin/gh/IvanKobzarev/177/base
2025-12-04T09:17:18.0798596Z  * [new branch]              gh/IvanKobzarev/177/head    -> origin/gh/IvanKobzarev/177/head
2025-12-04T09:17:18.0800440Z  * [new branch]              gh/IvanKobzarev/177/orig    -> origin/gh/IvanKobzarev/177/orig
2025-12-04T09:17:18.0803175Z  * [new branch]              gh/IvanKobzarev/178/base    -> origin/gh/IvanKobzarev/178/base
2025-12-04T09:17:18.0805058Z  * [new branch]              gh/IvanKobzarev/178/head    -> origin/gh/IvanKobzarev/178/head
2025-12-04T09:17:18.0806931Z  * [new branch]              gh/IvanKobzarev/178/orig    -> origin/gh/IvanKobzarev/178/orig
2025-12-04T09:17:18.0809753Z  * [new branch]              gh/IvanKobzarev/179/base    -> origin/gh/IvanKobzarev/179/base
2025-12-04T09:17:18.0811317Z  * [new branch]              gh/IvanKobzarev/179/head    -> origin/gh/IvanKobzarev/179/head
2025-12-04T09:17:18.0813481Z  * [new branch]              gh/IvanKobzarev/179/orig    -> origin/gh/IvanKobzarev/179/orig
2025-12-04T09:17:18.0816077Z  * [new branch]              gh/IvanKobzarev/180/base    -> origin/gh/IvanKobzarev/180/base
2025-12-04T09:17:18.0817696Z  * [new branch]              gh/IvanKobzarev/180/head    -> origin/gh/IvanKobzarev/180/head
2025-12-04T09:17:18.0819850Z  * [new branch]              gh/IvanKobzarev/180/orig    -> origin/gh/IvanKobzarev/180/orig
2025-12-04T09:17:18.0822681Z  * [new branch]              gh/IvanKobzarev/181/base    -> origin/gh/IvanKobzarev/181/base
2025-12-04T09:17:18.0824519Z  * [new branch]              gh/IvanKobzarev/181/head    -> origin/gh/IvanKobzarev/181/head
2025-12-04T09:17:18.0836118Z  * [new branch]              gh/IvanKobzarev/181/orig    -> origin/gh/IvanKobzarev/181/orig
2025-12-04T09:17:18.0836937Z  * [new branch]              gh/IvanKobzarev/182/base    -> origin/gh/IvanKobzarev/182/base
2025-12-04T09:17:18.0837743Z  * [new branch]              gh/IvanKobzarev/182/head    -> origin/gh/IvanKobzarev/182/head
2025-12-04T09:17:18.0838477Z  * [new branch]              gh/IvanKobzarev/182/orig    -> origin/gh/IvanKobzarev/182/orig
2025-12-04T09:17:18.0839182Z  * [new branch]              gh/IvanKobzarev/183/base    -> origin/gh/IvanKobzarev/183/base
2025-12-04T09:17:18.0839969Z  * [new branch]              gh/IvanKobzarev/183/head    -> origin/gh/IvanKobzarev/183/head
2025-12-04T09:17:18.0840582Z  * [new branch]              gh/IvanKobzarev/183/orig    -> origin/gh/IvanKobzarev/183/orig
2025-12-04T09:17:18.0841763Z  * [new branch]              gh/IvanKobzarev/184/base    -> origin/gh/IvanKobzarev/184/base
2025-12-04T09:17:18.0843838Z  * [new branch]              gh/IvanKobzarev/184/head    -> origin/gh/IvanKobzarev/184/head
2025-12-04T09:17:18.0845782Z  * [new branch]              gh/IvanKobzarev/184/orig    -> origin/gh/IvanKobzarev/184/orig
2025-12-04T09:17:18.0848868Z  * [new branch]              gh/NikhilAPatel/1/base      -> origin/gh/NikhilAPatel/1/base
2025-12-04T09:17:18.0850798Z  * [new branch]              gh/NikhilAPatel/1/head      -> origin/gh/NikhilAPatel/1/head
2025-12-04T09:17:18.0853182Z  * [new branch]              gh/NikhilAPatel/2/base      -> origin/gh/NikhilAPatel/2/base
2025-12-04T09:17:18.0854777Z  * [new branch]              gh/NikhilAPatel/2/head      -> origin/gh/NikhilAPatel/2/head
2025-12-04T09:17:18.0857689Z  * [new branch]              gh/NikhilAPatel/4/base      -> origin/gh/NikhilAPatel/4/base
2025-12-04T09:17:18.0859971Z  * [new branch]              gh/NikhilAPatel/4/head      -> origin/gh/NikhilAPatel/4/head
2025-12-04T09:17:18.0862273Z  * [new branch]              gh/NikhilAPatel/5/base      -> origin/gh/NikhilAPatel/5/base
2025-12-04T09:17:18.0864154Z  * [new branch]              gh/NikhilAPatel/5/head      -> origin/gh/NikhilAPatel/5/head
2025-12-04T09:17:18.0866083Z  * [new branch]              gh/NikhilAPatel/5/orig      -> origin/gh/NikhilAPatel/5/orig
2025-12-04T09:17:18.0869113Z  * [new branch]              gh/PaliC/17/base            -> origin/gh/PaliC/17/base
2025-12-04T09:17:18.0870940Z  * [new branch]              gh/PaliC/17/head            -> origin/gh/PaliC/17/head
2025-12-04T09:17:18.0872768Z  * [new branch]              gh/PaliC/17/orig            -> origin/gh/PaliC/17/orig
2025-12-04T09:17:18.0875352Z  * [new branch]              gh/PaliC/18/base            -> origin/gh/PaliC/18/base
2025-12-04T09:17:18.0877202Z  * [new branch]              gh/PaliC/18/head            -> origin/gh/PaliC/18/head
2025-12-04T09:17:18.0879103Z  * [new branch]              gh/PaliC/18/orig            -> origin/gh/PaliC/18/orig
2025-12-04T09:17:18.0881603Z  * [new branch]              gh/PaliC/20/base            -> origin/gh/PaliC/20/base
2025-12-04T09:17:18.0883425Z  * [new branch]              gh/PaliC/20/head            -> origin/gh/PaliC/20/head
2025-12-04T09:17:18.0885276Z  * [new branch]              gh/PaliC/20/orig            -> origin/gh/PaliC/20/orig
2025-12-04T09:17:18.0887798Z  * [new branch]              gh/PaliC/21/base            -> origin/gh/PaliC/21/base
2025-12-04T09:17:18.0889792Z  * [new branch]              gh/PaliC/21/head            -> origin/gh/PaliC/21/head
2025-12-04T09:17:18.0891276Z  * [new branch]              gh/PaliC/21/orig            -> origin/gh/PaliC/21/orig
2025-12-04T09:17:18.0893887Z  * [new branch]              gh/PaliC/23/base            -> origin/gh/PaliC/23/base
2025-12-04T09:17:18.0895541Z  * [new branch]              gh/PaliC/23/head            -> origin/gh/PaliC/23/head
2025-12-04T09:17:18.0897696Z  * [new branch]              gh/PaliC/23/orig            -> origin/gh/PaliC/23/orig
2025-12-04T09:17:18.0900208Z  * [new branch]              gh/PaliC/24/base            -> origin/gh/PaliC/24/base
2025-12-04T09:17:18.0902028Z  * [new branch]              gh/PaliC/24/head            -> origin/gh/PaliC/24/head
2025-12-04T09:17:18.0903840Z  * [new branch]              gh/PaliC/24/orig            -> origin/gh/PaliC/24/orig
2025-12-04T09:17:18.0906312Z  * [new branch]              gh/PaliC/25/head            -> origin/gh/PaliC/25/head
2025-12-04T09:17:18.0908344Z  * [new branch]              gh/PaliC/25/next            -> origin/gh/PaliC/25/next
2025-12-04T09:17:18.0910193Z  * [new branch]              gh/PaliC/25/orig            -> origin/gh/PaliC/25/orig
2025-12-04T09:17:18.0912674Z  * [new branch]              gh/PaliC/26/head            -> origin/gh/PaliC/26/head
2025-12-04T09:17:18.0914134Z  * [new branch]              gh/PaliC/26/next            -> origin/gh/PaliC/26/next
2025-12-04T09:17:18.0916098Z  * [new branch]              gh/PaliC/26/orig            -> origin/gh/PaliC/26/orig
2025-12-04T09:17:18.0918647Z  * [new branch]              gh/PaliC/27/next            -> origin/gh/PaliC/27/next
2025-12-04T09:17:18.0921148Z  * [new branch]              gh/PaliC/28/head            -> origin/gh/PaliC/28/head
2025-12-04T09:17:18.0922614Z  * [new branch]              gh/PaliC/28/next            -> origin/gh/PaliC/28/next
2025-12-04T09:17:18.0924606Z  * [new branch]              gh/PaliC/28/orig            -> origin/gh/PaliC/28/orig
2025-12-04T09:17:18.0927192Z  * [new branch]              gh/PaliC/29/head            -> origin/gh/PaliC/29/head
2025-12-04T09:17:18.0928701Z  * [new branch]              gh/PaliC/29/next            -> origin/gh/PaliC/29/next
2025-12-04T09:17:18.0930681Z  * [new branch]              gh/PaliC/29/orig            -> origin/gh/PaliC/29/orig
2025-12-04T09:17:18.0933271Z  * [new branch]              gh/PaliC/30/head            -> origin/gh/PaliC/30/head
2025-12-04T09:17:18.0934737Z  * [new branch]              gh/PaliC/30/next            -> origin/gh/PaliC/30/next
2025-12-04T09:17:18.0936672Z  * [new branch]              gh/PaliC/30/orig            -> origin/gh/PaliC/30/orig
2025-12-04T09:17:18.0939326Z  * [new branch]              gh/PaliC/31/head            -> origin/gh/PaliC/31/head
2025-12-04T09:17:18.0941166Z  * [new branch]              gh/PaliC/31/next            -> origin/gh/PaliC/31/next
2025-12-04T09:17:18.0943006Z  * [new branch]              gh/PaliC/31/orig            -> origin/gh/PaliC/31/orig
2025-12-04T09:17:18.0946041Z  * [new branch]              gh/PaulZhang12/25/base      -> origin/gh/PaulZhang12/25/base
2025-12-04T09:17:18.0948058Z  * [new branch]              gh/PaulZhang12/25/head      -> origin/gh/PaulZhang12/25/head
2025-12-04T09:17:18.0949621Z  * [new branch]              gh/PaulZhang12/25/orig      -> origin/gh/PaulZhang12/25/orig
2025-12-04T09:17:18.0952322Z  * [new branch]              gh/PaulZhang12/28/base      -> origin/gh/PaulZhang12/28/base
2025-12-04T09:17:18.0954266Z  * [new branch]              gh/PaulZhang12/28/head      -> origin/gh/PaulZhang12/28/head
2025-12-04T09:17:18.0956126Z  * [new branch]              gh/PaulZhang12/28/orig      -> origin/gh/PaulZhang12/28/orig
2025-12-04T09:17:18.0959009Z  * [new branch]              gh/PaulZhang12/31/base      -> origin/gh/PaulZhang12/31/base
2025-12-04T09:17:18.0961650Z  * [new branch]              gh/PaulZhang12/31/head      -> origin/gh/PaulZhang12/31/head
2025-12-04T09:17:18.0963244Z  * [new branch]              gh/PaulZhang12/31/orig      -> origin/gh/PaulZhang12/31/orig
2025-12-04T09:17:18.0965972Z  * [new branch]              gh/PaulZhang12/37/base      -> origin/gh/PaulZhang12/37/base
2025-12-04T09:17:18.0967490Z  * [new branch]              gh/PaulZhang12/37/head      -> origin/gh/PaulZhang12/37/head
2025-12-04T09:17:18.0969523Z  * [new branch]              gh/PaulZhang12/37/orig      -> origin/gh/PaulZhang12/37/orig
2025-12-04T09:17:18.0972134Z  * [new branch]              gh/PaulZhang12/40/base      -> origin/gh/PaulZhang12/40/base
2025-12-04T09:17:18.0974085Z  * [new branch]              gh/PaulZhang12/40/head      -> origin/gh/PaulZhang12/40/head
2025-12-04T09:17:18.0976028Z  * [new branch]              gh/PaulZhang12/40/orig      -> origin/gh/PaulZhang12/40/orig
2025-12-04T09:17:18.0978708Z  * [new branch]              gh/PaulZhang12/42/base      -> origin/gh/PaulZhang12/42/base
2025-12-04T09:17:18.0980663Z  * [new branch]              gh/PaulZhang12/42/head      -> origin/gh/PaulZhang12/42/head
2025-12-04T09:17:18.0983186Z  * [new branch]              gh/PaulZhang12/43/base      -> origin/gh/PaulZhang12/43/base
2025-12-04T09:17:18.0985055Z  * [new branch]              gh/PaulZhang12/43/head      -> origin/gh/PaulZhang12/43/head
2025-12-04T09:17:18.0986905Z  * [new branch]              gh/PaulZhang12/43/orig      -> origin/gh/PaulZhang12/43/orig
2025-12-04T09:17:18.0989318Z  * [new branch]              gh/PaulZhang12/44/base      -> origin/gh/PaulZhang12/44/base
2025-12-04T09:17:18.0991175Z  * [new branch]              gh/PaulZhang12/44/head      -> origin/gh/PaulZhang12/44/head
2025-12-04T09:17:18.0993795Z  * [new branch]              gh/PaulZhang12/45/base      -> origin/gh/PaulZhang12/45/base
2025-12-04T09:17:18.0995346Z  * [new branch]              gh/PaulZhang12/45/head      -> origin/gh/PaulZhang12/45/head
2025-12-04T09:17:18.0997256Z  * [new branch]              gh/PaulZhang12/45/orig      -> origin/gh/PaulZhang12/45/orig
2025-12-04T09:17:18.0999895Z  * [new branch]              gh/PaulZhang12/46/base      -> origin/gh/PaulZhang12/46/base
2025-12-04T09:17:18.1001772Z  * [new branch]              gh/PaulZhang12/46/head      -> origin/gh/PaulZhang12/46/head
2025-12-04T09:17:18.1003640Z  * [new branch]              gh/PaulZhang12/46/orig      -> origin/gh/PaulZhang12/46/orig
2025-12-04T09:17:18.1006221Z  * [new branch]              gh/PaulZhang12/47/base      -> origin/gh/PaulZhang12/47/base
2025-12-04T09:17:18.1008333Z  * [new branch]              gh/PaulZhang12/47/head      -> origin/gh/PaulZhang12/47/head
2025-12-04T09:17:18.1011649Z  * [new branch]              gh/PaulZhang12/47/orig      -> origin/gh/PaulZhang12/47/orig
2025-12-04T09:17:18.1013935Z  * [new branch]              gh/PaulZhang12/48/base      -> origin/gh/PaulZhang12/48/base
2025-12-04T09:17:18.1015538Z  * [new branch]              gh/PaulZhang12/48/head      -> origin/gh/PaulZhang12/48/head
2025-12-04T09:17:18.1017518Z  * [new branch]              gh/PaulZhang12/48/orig      -> origin/gh/PaulZhang12/48/orig
2025-12-04T09:17:18.1020917Z  * [new branch]              gh/SamGinzburg/11/base      -> origin/gh/SamGinzburg/11/base
2025-12-04T09:17:18.1022390Z  * [new branch]              gh/SamGinzburg/11/head      -> origin/gh/SamGinzburg/11/head
2025-12-04T09:17:18.1025834Z  * [new branch]              gh/SherlockNoMad/1/base     -> origin/gh/SherlockNoMad/1/base
2025-12-04T09:17:18.1027447Z  * [new branch]              gh/SherlockNoMad/1/head     -> origin/gh/SherlockNoMad/1/head
2025-12-04T09:17:18.1030244Z  * [new branch]              gh/SherlockNoMad/10/base    -> origin/gh/SherlockNoMad/10/base
2025-12-04T09:17:18.1032173Z  * [new branch]              gh/SherlockNoMad/10/head    -> origin/gh/SherlockNoMad/10/head
2025-12-04T09:17:18.1034081Z  * [new branch]              gh/SherlockNoMad/10/orig    -> origin/gh/SherlockNoMad/10/orig
2025-12-04T09:17:18.1036570Z  * [new branch]              gh/SherlockNoMad/11/base    -> origin/gh/SherlockNoMad/11/base
2025-12-04T09:17:18.1038170Z  * [new branch]              gh/SherlockNoMad/11/head    -> origin/gh/SherlockNoMad/11/head
2025-12-04T09:17:18.1040407Z  * [new branch]              gh/SherlockNoMad/11/orig    -> origin/gh/SherlockNoMad/11/orig
2025-12-04T09:17:18.1042667Z  * [new branch]              gh/SherlockNoMad/12/base    -> origin/gh/SherlockNoMad/12/base
2025-12-04T09:17:18.1044258Z  * [new branch]              gh/SherlockNoMad/12/head    -> origin/gh/SherlockNoMad/12/head
2025-12-04T09:17:18.1046008Z  * [new branch]              gh/SherlockNoMad/12/orig    -> origin/gh/SherlockNoMad/12/orig
2025-12-04T09:17:18.1048812Z  * [new branch]              gh/SherlockNoMad/15/base    -> origin/gh/SherlockNoMad/15/base
2025-12-04T09:17:18.1050669Z  * [new branch]              gh/SherlockNoMad/15/head    -> origin/gh/SherlockNoMad/15/head
2025-12-04T09:17:18.1052570Z  * [new branch]              gh/SherlockNoMad/15/orig    -> origin/gh/SherlockNoMad/15/orig
2025-12-04T09:17:18.1055052Z  * [new branch]              gh/SherlockNoMad/17/base    -> origin/gh/SherlockNoMad/17/base
2025-12-04T09:17:18.1056903Z  * [new branch]              gh/SherlockNoMad/17/head    -> origin/gh/SherlockNoMad/17/head
2025-12-04T09:17:18.1058507Z  * [new branch]              gh/SherlockNoMad/17/orig    -> origin/gh/SherlockNoMad/17/orig
2025-12-04T09:17:18.1061542Z  * [new branch]              gh/SherlockNoMad/18/base    -> origin/gh/SherlockNoMad/18/base
2025-12-04T09:17:18.1063401Z  * [new branch]              gh/SherlockNoMad/18/head    -> origin/gh/SherlockNoMad/18/head
2025-12-04T09:17:18.1065032Z  * [new branch]              gh/SherlockNoMad/18/orig    -> origin/gh/SherlockNoMad/18/orig
2025-12-04T09:17:18.1067546Z  * [new branch]              gh/SherlockNoMad/19/base    -> origin/gh/SherlockNoMad/19/base
2025-12-04T09:17:18.1069500Z  * [new branch]              gh/SherlockNoMad/19/head    -> origin/gh/SherlockNoMad/19/head
2025-12-04T09:17:18.1071395Z  * [new branch]              gh/SherlockNoMad/19/orig    -> origin/gh/SherlockNoMad/19/orig
2025-12-04T09:17:18.1073789Z  * [new branch]              gh/SherlockNoMad/2/base     -> origin/gh/SherlockNoMad/2/base
2025-12-04T09:17:18.1075361Z  * [new branch]              gh/SherlockNoMad/2/head     -> origin/gh/SherlockNoMad/2/head
2025-12-04T09:17:18.1077883Z  * [new branch]              gh/SherlockNoMad/20/base    -> origin/gh/SherlockNoMad/20/base
2025-12-04T09:17:18.1079956Z  * [new branch]              gh/SherlockNoMad/20/head    -> origin/gh/SherlockNoMad/20/head
2025-12-04T09:17:18.1081518Z  * [new branch]              gh/SherlockNoMad/20/orig    -> origin/gh/SherlockNoMad/20/orig
2025-12-04T09:17:18.1084384Z  * [new branch]              gh/SherlockNoMad/21/base    -> origin/gh/SherlockNoMad/21/base
2025-12-04T09:17:18.1086376Z  * [new branch]              gh/SherlockNoMad/21/head    -> origin/gh/SherlockNoMad/21/head
2025-12-04T09:17:18.1087912Z  * [new branch]              gh/SherlockNoMad/21/orig    -> origin/gh/SherlockNoMad/21/orig
2025-12-04T09:17:18.1090458Z  * [new branch]              gh/SherlockNoMad/3/base     -> origin/gh/SherlockNoMad/3/base
2025-12-04T09:17:18.1092026Z  * [new branch]              gh/SherlockNoMad/3/head     -> origin/gh/SherlockNoMad/3/head
2025-12-04T09:17:18.1094509Z  * [new branch]              gh/SherlockNoMad/4/base     -> origin/gh/SherlockNoMad/4/base
2025-12-04T09:17:18.1096141Z  * [new branch]              gh/SherlockNoMad/4/head     -> origin/gh/SherlockNoMad/4/head
2025-12-04T09:17:18.1098750Z  * [new branch]              gh/SherlockNoMad/5/base     -> origin/gh/SherlockNoMad/5/base
2025-12-04T09:17:18.1100792Z  * [new branch]              gh/SherlockNoMad/5/head     -> origin/gh/SherlockNoMad/5/head
2025-12-04T09:17:18.1104575Z  * [new branch]              gh/Sidharth123-cpu/24/base  -> origin/gh/Sidharth123-cpu/24/base
2025-12-04T09:17:18.1107040Z  * [new branch]              gh/Sidharth123-cpu/25/base  -> origin/gh/Sidharth123-cpu/25/base
2025-12-04T09:17:18.1109690Z  * [new branch]              gh/Sidharth123-cpu/26/base  -> origin/gh/Sidharth123-cpu/26/base
2025-12-04T09:17:18.1112235Z  * [new branch]              gh/Sidharth123-cpu/27/base  -> origin/gh/Sidharth123-cpu/27/base
2025-12-04T09:17:18.1115520Z  * [new branch]              gh/StrongerXi/1/base        -> origin/gh/StrongerXi/1/base
2025-12-04T09:17:18.1117462Z  * [new branch]              gh/StrongerXi/1/head        -> origin/gh/StrongerXi/1/head
2025-12-04T09:17:18.1120039Z  * [new branch]              gh/StrongerXi/71/base       -> origin/gh/StrongerXi/71/base
2025-12-04T09:17:18.1121718Z  * [new branch]              gh/StrongerXi/71/head       -> origin/gh/StrongerXi/71/head
2025-12-04T09:17:18.1124231Z  * [new branch]              gh/StrongerXi/72/base       -> origin/gh/StrongerXi/72/base
2025-12-04T09:17:18.1126096Z  * [new branch]              gh/StrongerXi/72/head       -> origin/gh/StrongerXi/72/head
2025-12-04T09:17:18.1128603Z  * [new branch]              gh/StrongerXi/73/base       -> origin/gh/StrongerXi/73/base
2025-12-04T09:17:18.1130412Z  * [new branch]              gh/StrongerXi/73/head       -> origin/gh/StrongerXi/73/head
2025-12-04T09:17:18.1132283Z  * [new branch]              gh/StrongerXi/73/orig       -> origin/gh/StrongerXi/73/orig
2025-12-04T09:17:18.1135440Z  * [new branch]              gh/XilunWu/160/base         -> origin/gh/XilunWu/160/base
2025-12-04T09:17:18.1137228Z  * [new branch]              gh/XilunWu/160/head         -> origin/gh/XilunWu/160/head
2025-12-04T09:17:18.1139158Z  * [new branch]              gh/XilunWu/160/orig         -> origin/gh/XilunWu/160/orig
2025-12-04T09:17:18.1141800Z  * [new branch]              gh/XilunWu/163/base         -> origin/gh/XilunWu/163/base
2025-12-04T09:17:18.1143639Z  * [new branch]              gh/XilunWu/163/head         -> origin/gh/XilunWu/163/head
2025-12-04T09:17:18.1145462Z  * [new branch]              gh/XilunWu/163/orig         -> origin/gh/XilunWu/163/orig
2025-12-04T09:17:18.1148126Z  * [new branch]              gh/XilunWu/168/base         -> origin/gh/XilunWu/168/base
2025-12-04T09:17:18.1149918Z  * [new branch]              gh/XilunWu/168/head         -> origin/gh/XilunWu/168/head
2025-12-04T09:17:18.1151465Z  * [new branch]              gh/XilunWu/168/orig         -> origin/gh/XilunWu/168/orig
2025-12-04T09:17:18.1154286Z  * [new branch]              gh/XilunWu/169/base         -> origin/gh/XilunWu/169/base
2025-12-04T09:17:18.1156178Z  * [new branch]              gh/XilunWu/169/head         -> origin/gh/XilunWu/169/head
2025-12-04T09:17:18.1158054Z  * [new branch]              gh/XilunWu/169/orig         -> origin/gh/XilunWu/169/orig
2025-12-04T09:17:18.1160423Z  * [new branch]              gh/XilunWu/170/base         -> origin/gh/XilunWu/170/base
2025-12-04T09:17:18.1162231Z  * [new branch]              gh/XilunWu/170/head         -> origin/gh/XilunWu/170/head
2025-12-04T09:17:18.1164054Z  * [new branch]              gh/XilunWu/170/orig         -> origin/gh/XilunWu/170/orig
2025-12-04T09:17:18.1166667Z  * [new branch]              gh/XilunWu/171/base         -> origin/gh/XilunWu/171/base
2025-12-04T09:17:18.1168649Z  * [new branch]              gh/XilunWu/171/head         -> origin/gh/XilunWu/171/head
2025-12-04T09:17:18.1170469Z  * [new branch]              gh/XilunWu/171/orig         -> origin/gh/XilunWu/171/orig
2025-12-04T09:17:18.1172887Z  * [new branch]              gh/XilunWu/173/base         -> origin/gh/XilunWu/173/base
2025-12-04T09:17:18.1174823Z  * [new branch]              gh/XilunWu/173/head         -> origin/gh/XilunWu/173/head
2025-12-04T09:17:18.1176635Z  * [new branch]              gh/XilunWu/173/orig         -> origin/gh/XilunWu/173/orig
2025-12-04T09:17:18.1179200Z  * [new branch]              gh/XilunWu/175/base         -> origin/gh/XilunWu/175/base
2025-12-04T09:17:18.1181228Z  * [new branch]              gh/XilunWu/175/head         -> origin/gh/XilunWu/175/head
2025-12-04T09:17:18.1183059Z  * [new branch]              gh/XilunWu/175/orig         -> origin/gh/XilunWu/175/orig
2025-12-04T09:17:18.1185623Z  * [new branch]              gh/XilunWu/176/base         -> origin/gh/XilunWu/176/base
2025-12-04T09:17:18.1187490Z  * [new branch]              gh/XilunWu/176/head         -> origin/gh/XilunWu/176/head
2025-12-04T09:17:18.1189494Z  * [new branch]              gh/XilunWu/176/orig         -> origin/gh/XilunWu/176/orig
2025-12-04T09:17:18.1192481Z  * [new branch]              gh/XuehaiPan/14/base        -> origin/gh/XuehaiPan/14/base
2025-12-04T09:17:18.1194305Z  * [new branch]              gh/XuehaiPan/14/head        -> origin/gh/XuehaiPan/14/head
2025-12-04T09:17:18.1195905Z  * [new branch]              gh/XuehaiPan/14/orig        -> origin/gh/XuehaiPan/14/orig
2025-12-04T09:17:18.1198691Z  * [new branch]              gh/XuehaiPan/179/base       -> origin/gh/XuehaiPan/179/base
2025-12-04T09:17:18.1200595Z  * [new branch]              gh/XuehaiPan/179/head       -> origin/gh/XuehaiPan/179/head
2025-12-04T09:17:18.1202564Z  * [new branch]              gh/XuehaiPan/179/orig       -> origin/gh/XuehaiPan/179/orig
2025-12-04T09:17:18.1205051Z  * [new branch]              gh/XuehaiPan/249/base       -> origin/gh/XuehaiPan/249/base
2025-12-04T09:17:18.1206878Z  * [new branch]              gh/XuehaiPan/249/head       -> origin/gh/XuehaiPan/249/head
2025-12-04T09:17:18.1208660Z  * [new branch]              gh/XuehaiPan/249/orig       -> origin/gh/XuehaiPan/249/orig
2025-12-04T09:17:18.1211430Z  * [new branch]              gh/XuehaiPan/253/base       -> origin/gh/XuehaiPan/253/base
2025-12-04T09:17:18.1213299Z  * [new branch]              gh/XuehaiPan/253/head       -> origin/gh/XuehaiPan/253/head
2025-12-04T09:17:18.1215166Z  * [new branch]              gh/XuehaiPan/253/orig       -> origin/gh/XuehaiPan/253/orig
2025-12-04T09:17:18.1217692Z  * [new branch]              gh/XuehaiPan/254/base       -> origin/gh/XuehaiPan/254/base
2025-12-04T09:17:18.1219701Z  * [new branch]              gh/XuehaiPan/254/head       -> origin/gh/XuehaiPan/254/head
2025-12-04T09:17:18.1221496Z  * [new branch]              gh/XuehaiPan/254/orig       -> origin/gh/XuehaiPan/254/orig
2025-12-04T09:17:18.1224063Z  * [new branch]              gh/XuehaiPan/255/base       -> origin/gh/XuehaiPan/255/base
2025-12-04T09:17:18.1226061Z  * [new branch]              gh/XuehaiPan/255/head       -> origin/gh/XuehaiPan/255/head
2025-12-04T09:17:18.1227919Z  * [new branch]              gh/XuehaiPan/255/orig       -> origin/gh/XuehaiPan/255/orig
2025-12-04T09:17:18.1230447Z  * [new branch]              gh/XuehaiPan/271/base       -> origin/gh/XuehaiPan/271/base
2025-12-04T09:17:18.1232273Z  * [new branch]              gh/XuehaiPan/271/head       -> origin/gh/XuehaiPan/271/head
2025-12-04T09:17:18.1234238Z  * [new branch]              gh/XuehaiPan/271/orig       -> origin/gh/XuehaiPan/271/orig
2025-12-04T09:17:18.1236798Z  * [new branch]              gh/XuehaiPan/343/base       -> origin/gh/XuehaiPan/343/base
2025-12-04T09:17:18.1238613Z  * [new branch]              gh/XuehaiPan/343/head       -> origin/gh/XuehaiPan/343/head
2025-12-04T09:17:18.1240440Z  * [new branch]              gh/XuehaiPan/343/orig       -> origin/gh/XuehaiPan/343/orig
2025-12-04T09:17:18.1242993Z  * [new branch]              gh/XuehaiPan/347/base       -> origin/gh/XuehaiPan/347/base
2025-12-04T09:17:18.1245083Z  * [new branch]              gh/XuehaiPan/347/head       -> origin/gh/XuehaiPan/347/head
2025-12-04T09:17:18.1246501Z  * [new branch]              gh/XuehaiPan/347/orig       -> origin/gh/XuehaiPan/347/orig
2025-12-04T09:17:18.1249122Z  * [new branch]              gh/XuehaiPan/348/base       -> origin/gh/XuehaiPan/348/base
2025-12-04T09:17:18.1250994Z  * [new branch]              gh/XuehaiPan/348/head       -> origin/gh/XuehaiPan/348/head
2025-12-04T09:17:18.1252795Z  * [new branch]              gh/XuehaiPan/348/orig       -> origin/gh/XuehaiPan/348/orig
2025-12-04T09:17:18.1255298Z  * [new branch]              gh/XuehaiPan/350/base       -> origin/gh/XuehaiPan/350/base
2025-12-04T09:17:18.1257102Z  * [new branch]              gh/XuehaiPan/350/head       -> origin/gh/XuehaiPan/350/head
2025-12-04T09:17:18.1259036Z  * [new branch]              gh/XuehaiPan/350/orig       -> origin/gh/XuehaiPan/350/orig
2025-12-04T09:17:18.1261890Z  * [new branch]              gh/XuehaiPan/365/base       -> origin/gh/XuehaiPan/365/base
2025-12-04T09:17:18.1263405Z  * [new branch]              gh/XuehaiPan/365/head       -> origin/gh/XuehaiPan/365/head
2025-12-04T09:17:18.1265392Z  * [new branch]              gh/XuehaiPan/365/orig       -> origin/gh/XuehaiPan/365/orig
2025-12-04T09:17:18.1268084Z  * [new branch]              gh/XuehaiPan/366/base       -> origin/gh/XuehaiPan/366/base
2025-12-04T09:17:18.1269709Z  * [new branch]              gh/XuehaiPan/366/head       -> origin/gh/XuehaiPan/366/head
2025-12-04T09:17:18.1272339Z  * [new branch]              gh/XuehaiPan/370/base       -> origin/gh/XuehaiPan/370/base
2025-12-04T09:17:18.1274199Z  * [new branch]              gh/XuehaiPan/370/head       -> origin/gh/XuehaiPan/370/head
2025-12-04T09:17:18.1275772Z  * [new branch]              gh/XuehaiPan/370/orig       -> origin/gh/XuehaiPan/370/orig
2025-12-04T09:17:18.1278589Z  * [new branch]              gh/XuehaiPan/390/base       -> origin/gh/XuehaiPan/390/base
2025-12-04T09:17:18.1280462Z  * [new branch]              gh/XuehaiPan/390/head       -> origin/gh/XuehaiPan/390/head
2025-12-04T09:17:18.1282258Z  * [new branch]              gh/XuehaiPan/390/orig       -> origin/gh/XuehaiPan/390/orig
2025-12-04T09:17:18.1284830Z  * [new branch]              gh/XuehaiPan/391/base       -> origin/gh/XuehaiPan/391/base
2025-12-04T09:17:18.1286384Z  * [new branch]              gh/XuehaiPan/391/head       -> origin/gh/XuehaiPan/391/head
2025-12-04T09:17:18.1288411Z  * [new branch]              gh/XuehaiPan/391/orig       -> origin/gh/XuehaiPan/391/orig
2025-12-04T09:17:18.1290879Z  * [new branch]              gh/XuehaiPan/392/base       -> origin/gh/XuehaiPan/392/base
2025-12-04T09:17:18.1292701Z  * [new branch]              gh/XuehaiPan/392/head       -> origin/gh/XuehaiPan/392/head
2025-12-04T09:17:18.1294588Z  * [new branch]              gh/XuehaiPan/392/orig       -> origin/gh/XuehaiPan/392/orig
2025-12-04T09:17:18.1297652Z  * [new branch]              gh/XuehaiPan/394/base       -> origin/gh/XuehaiPan/394/base
2025-12-04T09:17:18.1299668Z  * [new branch]              gh/XuehaiPan/394/head       -> origin/gh/XuehaiPan/394/head
2025-12-04T09:17:18.1301471Z  * [new branch]              gh/XuehaiPan/394/orig       -> origin/gh/XuehaiPan/394/orig
2025-12-04T09:17:18.1304098Z  * [new branch]              gh/XuehaiPan/397/base       -> origin/gh/XuehaiPan/397/base
2025-12-04T09:17:18.1305947Z  * [new branch]              gh/XuehaiPan/397/head       -> origin/gh/XuehaiPan/397/head
2025-12-04T09:17:18.1308174Z  * [new branch]              gh/XuehaiPan/397/orig       -> origin/gh/XuehaiPan/397/orig
2025-12-04T09:17:18.1310633Z  * [new branch]              gh/XuehaiPan/398/base       -> origin/gh/XuehaiPan/398/base
2025-12-04T09:17:18.1312180Z  * [new branch]              gh/XuehaiPan/398/head       -> origin/gh/XuehaiPan/398/head
2025-12-04T09:17:18.1314172Z  * [new branch]              gh/XuehaiPan/398/orig       -> origin/gh/XuehaiPan/398/orig
2025-12-04T09:17:18.1316760Z  * [new branch]              gh/XuehaiPan/399/base       -> origin/gh/XuehaiPan/399/base
2025-12-04T09:17:18.1318589Z  * [new branch]              gh/XuehaiPan/399/head       -> origin/gh/XuehaiPan/399/head
2025-12-04T09:17:18.1320395Z  * [new branch]              gh/XuehaiPan/399/orig       -> origin/gh/XuehaiPan/399/orig
2025-12-04T09:17:18.1323050Z  * [new branch]              gh/XuehaiPan/400/base       -> origin/gh/XuehaiPan/400/base
2025-12-04T09:17:18.1324918Z  * [new branch]              gh/XuehaiPan/400/head       -> origin/gh/XuehaiPan/400/head
2025-12-04T09:17:18.1326733Z  * [new branch]              gh/XuehaiPan/400/orig       -> origin/gh/XuehaiPan/400/orig
2025-12-04T09:17:18.1329847Z  * [new branch]              gh/ZhiweiYan-96/39/base     -> origin/gh/ZhiweiYan-96/39/base
2025-12-04T09:17:18.1331431Z  * [new branch]              gh/ZhiweiYan-96/39/head     -> origin/gh/ZhiweiYan-96/39/head
2025-12-04T09:17:18.1333462Z  * [new branch]              gh/ZhiweiYan-96/39/orig     -> origin/gh/ZhiweiYan-96/39/orig
2025-12-04T09:17:18.1336307Z  * [new branch]              gh/ZhiweiYan-96/44/base     -> origin/gh/ZhiweiYan-96/44/base
2025-12-04T09:17:18.1337734Z  * [new branch]              gh/ZhiweiYan-96/44/head     -> origin/gh/ZhiweiYan-96/44/head
2025-12-04T09:17:18.1340667Z  * [new branch]              gh/ZhiweiYan-96/45/base     -> origin/gh/ZhiweiYan-96/45/base
2025-12-04T09:17:18.1342279Z  * [new branch]              gh/ZhiweiYan-96/45/head     -> origin/gh/ZhiweiYan-96/45/head
2025-12-04T09:17:18.1345083Z  * [new branch]              gh/ZhiweiYan-96/49/base     -> origin/gh/ZhiweiYan-96/49/base
2025-12-04T09:17:18.1346962Z  * [new branch]              gh/ZhiweiYan-96/49/head     -> origin/gh/ZhiweiYan-96/49/head
2025-12-04T09:17:18.1349429Z  * [new branch]              gh/ZhiweiYan-96/62/base     -> origin/gh/ZhiweiYan-96/62/base
2025-12-04T09:17:18.1351246Z  * [new branch]              gh/ZhiweiYan-96/62/head     -> origin/gh/ZhiweiYan-96/62/head
2025-12-04T09:17:18.1353819Z  * [new branch]              gh/ZhiweiYan-96/66/base     -> origin/gh/ZhiweiYan-96/66/base
2025-12-04T09:17:18.1355688Z  * [new branch]              gh/ZhiweiYan-96/66/head     -> origin/gh/ZhiweiYan-96/66/head
2025-12-04T09:17:18.1358112Z  * [new branch]              gh/ZhiweiYan-96/67/base     -> origin/gh/ZhiweiYan-96/67/base
2025-12-04T09:17:18.1359688Z  * [new branch]              gh/ZhiweiYan-96/67/head     -> origin/gh/ZhiweiYan-96/67/head
2025-12-04T09:17:18.1362294Z  * [new branch]              gh/ZhiweiYan-96/68/base     -> origin/gh/ZhiweiYan-96/68/base
2025-12-04T09:17:18.1364080Z  * [new branch]              gh/ZhiweiYan-96/68/head     -> origin/gh/ZhiweiYan-96/68/head
2025-12-04T09:17:18.1366012Z  * [new branch]              gh/ZhiweiYan-96/68/orig     -> origin/gh/ZhiweiYan-96/68/orig
2025-12-04T09:17:18.1369259Z  * [new branch]              gh/aakhundov/1/base         -> origin/gh/aakhundov/1/base
2025-12-04T09:17:18.1371097Z  * [new branch]              gh/aakhundov/1/head         -> origin/gh/aakhundov/1/head
2025-12-04T09:17:18.1373473Z  * [new branch]              gh/aakhundov/2/base         -> origin/gh/aakhundov/2/base
2025-12-04T09:17:18.1375339Z  * [new branch]              gh/aakhundov/2/head         -> origin/gh/aakhundov/2/head
2025-12-04T09:17:18.1377774Z  * [new branch]              gh/aditew01/openblas        -> origin/gh/aditew01/openblas
2025-12-04T09:17:18.1379722Z  * [new branch]              gh/aditew01/sbgemm          -> origin/gh/aditew01/sbgemm
2025-12-04T09:17:18.1381591Z  * [new branch]              gh/aditew01/vecbf16         -> origin/gh/aditew01/vecbf16
2025-12-04T09:17:18.1384730Z  * [new branch]              gh/albanD/4/base            -> origin/gh/albanD/4/base
2025-12-04T09:17:18.1386224Z  * [new branch]              gh/albanD/4/head            -> origin/gh/albanD/4/head
2025-12-04T09:17:18.1388192Z  * [new branch]              gh/albanD/4/orig            -> origin/gh/albanD/4/orig
2025-12-04T09:17:18.1391070Z  * [new branch]              gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init
2025-12-04T09:17:18.1394138Z  * [new branch]              gh/alexsamardzic/12/base    -> origin/gh/alexsamardzic/12/base
2025-12-04T09:17:18.1395676Z  * [new branch]              gh/alexsamardzic/12/head    -> origin/gh/alexsamardzic/12/head
2025-12-04T09:17:18.1397701Z  * [new branch]              gh/alexsamardzic/12/orig    -> origin/gh/alexsamardzic/12/orig
2025-12-04T09:17:18.1400214Z  * [new branch]              gh/alexsamardzic/14/base    -> origin/gh/alexsamardzic/14/base
2025-12-04T09:17:18.1402173Z  * [new branch]              gh/alexsamardzic/14/head    -> origin/gh/alexsamardzic/14/head
2025-12-04T09:17:18.1403638Z  * [new branch]              gh/alexsamardzic/14/orig    -> origin/gh/alexsamardzic/14/orig
2025-12-04T09:17:18.1406379Z  * [new branch]              gh/alexsamardzic/15/base    -> origin/gh/alexsamardzic/15/base
2025-12-04T09:17:18.1408466Z  * [new branch]              gh/alexsamardzic/15/head    -> origin/gh/alexsamardzic/15/head
2025-12-04T09:17:18.1413309Z  * [new branch]              gh/alexsamardzic/15/orig    -> origin/gh/alexsamardzic/15/orig
2025-12-04T09:17:18.1416020Z  * [new branch]              gh/amjames/18/base          -> origin/gh/amjames/18/base
2025-12-04T09:17:18.1417934Z  * [new branch]              gh/amjames/18/head          -> origin/gh/amjames/18/head
2025-12-04T09:17:18.1420064Z  * [new branch]              gh/amjames/18/orig          -> origin/gh/amjames/18/orig
2025-12-04T09:17:18.1423253Z  * [new branch]              gh/andrewor14/35/base       -> origin/gh/andrewor14/35/base
2025-12-04T09:17:18.1425079Z  * [new branch]              gh/andrewor14/35/head       -> origin/gh/andrewor14/35/head
2025-12-04T09:17:18.1427027Z  * [new branch]              gh/andrewor14/35/orig       -> origin/gh/andrewor14/35/orig
2025-12-04T09:17:18.1429867Z  * [new branch]              gh/andrewor14/50/base       -> origin/gh/andrewor14/50/base
2025-12-04T09:17:18.1431693Z  * [new branch]              gh/andrewor14/50/head       -> origin/gh/andrewor14/50/head
2025-12-04T09:17:18.1433627Z  * [new branch]              gh/andrewor14/50/orig       -> origin/gh/andrewor14/50/orig
2025-12-04T09:17:18.1436679Z  * [new branch]              gh/andyanwang/30/base       -> origin/gh/andyanwang/30/base
2025-12-04T09:17:18.1438898Z  * [new branch]              gh/andyanwang/30/orig       -> origin/gh/andyanwang/30/orig
2025-12-04T09:17:18.1441509Z  * [new branch]              gh/andyanwang/31/base       -> origin/gh/andyanwang/31/base
2025-12-04T09:17:18.1443524Z  * [new branch]              gh/andyanwang/31/orig       -> origin/gh/andyanwang/31/orig
2025-12-04T09:17:18.1446040Z  * [new branch]              gh/andyanwang/39/base       -> origin/gh/andyanwang/39/base
2025-12-04T09:17:18.1447964Z  * [new branch]              gh/andyanwang/39/head       -> origin/gh/andyanwang/39/head
2025-12-04T09:17:18.1450252Z  * [new branch]              gh/andyanwang/39/orig       -> origin/gh/andyanwang/39/orig
2025-12-04T09:17:18.1452911Z  * [new branch]              gh/andyanwang/42/base       -> origin/gh/andyanwang/42/base
2025-12-04T09:17:18.1454427Z  * [new branch]              gh/andyanwang/42/head       -> origin/gh/andyanwang/42/head
2025-12-04T09:17:18.1456534Z  * [new branch]              gh/andyanwang/42/orig       -> origin/gh/andyanwang/42/orig
2025-12-04T09:17:18.1459382Z  * [new branch]              gh/andyanwang/45/base       -> origin/gh/andyanwang/45/base
2025-12-04T09:17:18.1461236Z  * [new branch]              gh/andyanwang/45/head       -> origin/gh/andyanwang/45/head
2025-12-04T09:17:18.1463026Z  * [new branch]              gh/andyanwang/45/orig       -> origin/gh/andyanwang/45/orig
2025-12-04T09:17:18.1466276Z  * [new branch]              gh/angelayi/107/base        -> origin/gh/angelayi/107/base
2025-12-04T09:17:18.1467743Z  * [new branch]              gh/angelayi/107/head        -> origin/gh/angelayi/107/head
2025-12-04T09:17:18.1470456Z  * [new branch]              gh/angelayi/114/base        -> origin/gh/angelayi/114/base
2025-12-04T09:17:18.1472411Z  * [new branch]              gh/angelayi/114/head        -> origin/gh/angelayi/114/head
2025-12-04T09:17:18.1474266Z  * [new branch]              gh/angelayi/114/orig        -> origin/gh/angelayi/114/orig
2025-12-04T09:17:18.1477014Z  * [new branch]              gh/angelayi/116/base        -> origin/gh/angelayi/116/base
2025-12-04T09:17:18.1478830Z  * [new branch]              gh/angelayi/116/head        -> origin/gh/angelayi/116/head
2025-12-04T09:17:18.1480637Z  * [new branch]              gh/angelayi/116/orig        -> origin/gh/angelayi/116/orig
2025-12-04T09:17:18.1483455Z  * [new branch]              gh/angelayi/122/base        -> origin/gh/angelayi/122/base
2025-12-04T09:17:18.1484966Z  * [new branch]              gh/angelayi/122/head        -> origin/gh/angelayi/122/head
2025-12-04T09:17:18.1486932Z  * [new branch]              gh/angelayi/122/orig        -> origin/gh/angelayi/122/orig
2025-12-04T09:17:18.1489577Z  * [new branch]              gh/angelayi/124/base        -> origin/gh/angelayi/124/base
2025-12-04T09:17:18.1491577Z  * [new branch]              gh/angelayi/124/head        -> origin/gh/angelayi/124/head
2025-12-04T09:17:18.1493352Z  * [new branch]              gh/angelayi/124/orig        -> origin/gh/angelayi/124/orig
2025-12-04T09:17:18.1495821Z  * [new branch]              gh/angelayi/128/base        -> origin/gh/angelayi/128/base
2025-12-04T09:17:18.1497502Z  * [new branch]              gh/angelayi/128/head        -> origin/gh/angelayi/128/head
2025-12-04T09:17:18.1499963Z  * [new branch]              gh/angelayi/128/orig        -> origin/gh/angelayi/128/orig
2025-12-04T09:17:18.1502659Z  * [new branch]              gh/angelayi/131/base        -> origin/gh/angelayi/131/base
2025-12-04T09:17:18.1503818Z  * [new branch]              gh/angelayi/131/head        -> origin/gh/angelayi/131/head
2025-12-04T09:17:18.1505881Z  * [new branch]              gh/angelayi/131/orig        -> origin/gh/angelayi/131/orig
2025-12-04T09:17:18.1508956Z  * [new branch]              gh/angelayi/132/base        -> origin/gh/angelayi/132/base
2025-12-04T09:17:18.1510825Z  * [new branch]              gh/angelayi/132/head        -> origin/gh/angelayi/132/head
2025-12-04T09:17:18.1512760Z  * [new branch]              gh/angelayi/132/orig        -> origin/gh/angelayi/132/orig
2025-12-04T09:17:18.1515274Z  * [new branch]              gh/angelayi/133/base        -> origin/gh/angelayi/133/base
2025-12-04T09:17:18.1517274Z  * [new branch]              gh/angelayi/133/head        -> origin/gh/angelayi/133/head
2025-12-04T09:17:18.1519124Z  * [new branch]              gh/angelayi/133/orig        -> origin/gh/angelayi/133/orig
2025-12-04T09:17:18.1521908Z  * [new branch]              gh/angelayi/134/base        -> origin/gh/angelayi/134/base
2025-12-04T09:17:18.1523863Z  * [new branch]              gh/angelayi/134/head        -> origin/gh/angelayi/134/head
2025-12-04T09:17:18.1525704Z  * [new branch]              gh/angelayi/134/orig        -> origin/gh/angelayi/134/orig
2025-12-04T09:17:18.1528372Z  * [new branch]              gh/angelayi/135/base        -> origin/gh/angelayi/135/base
2025-12-04T09:17:18.1530287Z  * [new branch]              gh/angelayi/135/head        -> origin/gh/angelayi/135/head
2025-12-04T09:17:18.1532087Z  * [new branch]              gh/angelayi/135/orig        -> origin/gh/angelayi/135/orig
2025-12-04T09:17:18.1534612Z  * [new branch]              gh/angelayi/136/base        -> origin/gh/angelayi/136/base
2025-12-04T09:17:18.1536198Z  * [new branch]              gh/angelayi/136/head        -> origin/gh/angelayi/136/head
2025-12-04T09:17:18.1538230Z  * [new branch]              gh/angelayi/136/orig        -> origin/gh/angelayi/136/orig
2025-12-04T09:17:18.1540965Z  * [new branch]              gh/angelayi/137/base        -> origin/gh/angelayi/137/base
2025-12-04T09:17:18.1542773Z  * [new branch]              gh/angelayi/137/head        -> origin/gh/angelayi/137/head
2025-12-04T09:17:18.1544851Z  * [new branch]              gh/angelayi/137/orig        -> origin/gh/angelayi/137/orig
2025-12-04T09:17:18.1547239Z  * [new branch]              gh/angelayi/138/base        -> origin/gh/angelayi/138/base
2025-12-04T09:17:18.1548790Z  * [new branch]              gh/angelayi/138/head        -> origin/gh/angelayi/138/head
2025-12-04T09:17:18.1550791Z  * [new branch]              gh/angelayi/138/orig        -> origin/gh/angelayi/138/orig
2025-12-04T09:17:18.1553259Z  * [new branch]              gh/angelayi/139/base        -> origin/gh/angelayi/139/base
2025-12-04T09:17:18.1555116Z  * [new branch]              gh/angelayi/139/head        -> origin/gh/angelayi/139/head
2025-12-04T09:17:18.1557001Z  * [new branch]              gh/angelayi/139/orig        -> origin/gh/angelayi/139/orig
2025-12-04T09:17:18.1559687Z  * [new branch]              gh/angelayi/140/base        -> origin/gh/angelayi/140/base
2025-12-04T09:17:18.1561538Z  * [new branch]              gh/angelayi/140/head        -> origin/gh/angelayi/140/head
2025-12-04T09:17:18.1563435Z  * [new branch]              gh/angelayi/140/orig        -> origin/gh/angelayi/140/orig
2025-12-04T09:17:18.1566700Z  * [new branch]              gh/angelayi/141/base        -> origin/gh/angelayi/141/base
2025-12-04T09:17:18.1568254Z  * [new branch]              gh/angelayi/141/head        -> origin/gh/angelayi/141/head
2025-12-04T09:17:18.1570294Z  * [new branch]              gh/angelayi/141/orig        -> origin/gh/angelayi/141/orig
2025-12-04T09:17:18.1572998Z  * [new branch]              gh/angelayi/142/base        -> origin/gh/angelayi/142/base
2025-12-04T09:17:18.1574793Z  * [new branch]              gh/angelayi/142/head        -> origin/gh/angelayi/142/head
2025-12-04T09:17:18.1576366Z  * [new branch]              gh/angelayi/142/orig        -> origin/gh/angelayi/142/orig
2025-12-04T09:17:18.1579257Z  * [new branch]              gh/angelayi/143/base        -> origin/gh/angelayi/143/base
2025-12-04T09:17:18.1581219Z  * [new branch]              gh/angelayi/143/head        -> origin/gh/angelayi/143/head
2025-12-04T09:17:18.1583020Z  * [new branch]              gh/angelayi/143/orig        -> origin/gh/angelayi/143/orig
2025-12-04T09:17:18.1585642Z  * [new branch]              gh/angelayi/144/base        -> origin/gh/angelayi/144/base
2025-12-04T09:17:18.1587631Z  * [new branch]              gh/angelayi/144/head        -> origin/gh/angelayi/144/head
2025-12-04T09:17:18.1589476Z  * [new branch]              gh/angelayi/144/orig        -> origin/gh/angelayi/144/orig
2025-12-04T09:17:18.1592832Z  * [new branch]              gh/anijain2305/753/base     -> origin/gh/anijain2305/753/base
2025-12-04T09:17:18.1594675Z  * [new branch]              gh/anijain2305/753/head     -> origin/gh/anijain2305/753/head
2025-12-04T09:17:18.1596199Z  * [new branch]              gh/anijain2305/753/orig     -> origin/gh/anijain2305/753/orig
2025-12-04T09:17:18.1599153Z  * [new branch]              gh/anijain2305/810/base     -> origin/gh/anijain2305/810/base
2025-12-04T09:17:18.1601014Z  * [new branch]              gh/anijain2305/810/head     -> origin/gh/anijain2305/810/head
2025-12-04T09:17:18.1602882Z  * [new branch]              gh/anijain2305/810/orig     -> origin/gh/anijain2305/810/orig
2025-12-04T09:17:18.1605411Z  * [new branch]              gh/anijain2305/854/base     -> origin/gh/anijain2305/854/base
2025-12-04T09:17:18.1607281Z  * [new branch]              gh/anijain2305/854/head     -> origin/gh/anijain2305/854/head
2025-12-04T09:17:18.1609352Z  * [new branch]              gh/anijain2305/854/orig     -> origin/gh/anijain2305/854/orig
2025-12-04T09:17:18.1639183Z  * [new branch]              gh/anijain2305/864/base     -> origin/gh/anijain2305/864/base
2025-12-04T09:17:18.1639967Z  * [new branch]              gh/anijain2305/864/head     -> origin/gh/anijain2305/864/head
2025-12-04T09:17:18.1640667Z  * [new branch]              gh/anijain2305/864/orig     -> origin/gh/anijain2305/864/orig
2025-12-04T09:17:18.1641411Z  * [new branch]              gh/anijain2305/870/base     -> origin/gh/anijain2305/870/base
2025-12-04T09:17:18.1642025Z  * [new branch]              gh/anijain2305/870/head     -> origin/gh/anijain2305/870/head
2025-12-04T09:17:18.1642610Z  * [new branch]              gh/anijain2305/870/orig     -> origin/gh/anijain2305/870/orig
2025-12-04T09:17:18.1643423Z  * [new branch]              gh/anijain2305/873/base     -> origin/gh/anijain2305/873/base
2025-12-04T09:17:18.1644195Z  * [new branch]              gh/anijain2305/873/head     -> origin/gh/anijain2305/873/head
2025-12-04T09:17:18.1644799Z  * [new branch]              gh/anijain2305/873/orig     -> origin/gh/anijain2305/873/orig
2025-12-04T09:17:18.1645551Z  * [new branch]              gh/anijain2305/894/base     -> origin/gh/anijain2305/894/base
2025-12-04T09:17:18.1646185Z  * [new branch]              gh/anijain2305/894/head     -> origin/gh/anijain2305/894/head
2025-12-04T09:17:18.1646825Z  * [new branch]              gh/anijain2305/894/orig     -> origin/gh/anijain2305/894/orig
2025-12-04T09:17:18.1647560Z  * [new branch]              gh/anijain2305/895/base     -> origin/gh/anijain2305/895/base
2025-12-04T09:17:18.1648147Z  * [new branch]              gh/anijain2305/895/head     -> origin/gh/anijain2305/895/head
2025-12-04T09:17:18.1649022Z  * [new branch]              gh/anijain2305/895/orig     -> origin/gh/anijain2305/895/orig
2025-12-04T09:17:18.1649766Z  * [new branch]              gh/anijain2305/910/base     -> origin/gh/anijain2305/910/base
2025-12-04T09:17:18.1650465Z  * [new branch]              gh/anijain2305/910/head     -> origin/gh/anijain2305/910/head
2025-12-04T09:17:18.1651247Z  * [new branch]              gh/anijain2305/910/orig     -> origin/gh/anijain2305/910/orig
2025-12-04T09:17:18.1652001Z  * [new branch]              gh/anijain2305/919/base     -> origin/gh/anijain2305/919/base
2025-12-04T09:17:18.1652602Z  * [new branch]              gh/anijain2305/919/head     -> origin/gh/anijain2305/919/head
2025-12-04T09:17:18.1653308Z  * [new branch]              gh/anijain2305/919/orig     -> origin/gh/anijain2305/919/orig
2025-12-04T09:17:18.1656064Z  * [new branch]              gh/anijain2305/922/base     -> origin/gh/anijain2305/922/base
2025-12-04T09:17:18.1657574Z  * [new branch]              gh/anijain2305/922/head     -> origin/gh/anijain2305/922/head
2025-12-04T09:17:18.1659759Z  * [new branch]              gh/anijain2305/922/orig     -> origin/gh/anijain2305/922/orig
2025-12-04T09:17:18.1662330Z  * [new branch]              gh/anijain2305/932/base     -> origin/gh/anijain2305/932/base
2025-12-04T09:17:18.1664314Z  * [new branch]              gh/anijain2305/932/head     -> origin/gh/anijain2305/932/head
2025-12-04T09:17:18.1666270Z  * [new branch]              gh/anijain2305/932/orig     -> origin/gh/anijain2305/932/orig
2025-12-04T09:17:18.1668803Z  * [new branch]              gh/anijain2305/940/base     -> origin/gh/anijain2305/940/base
2025-12-04T09:17:18.1670625Z  * [new branch]              gh/anijain2305/940/head     -> origin/gh/anijain2305/940/head
2025-12-04T09:17:18.1672456Z  * [new branch]              gh/anijain2305/940/orig     -> origin/gh/anijain2305/940/orig
2025-12-04T09:17:18.1675104Z  * [new branch]              gh/anijain2305/941/base     -> origin/gh/anijain2305/941/base
2025-12-04T09:17:18.1676939Z  * [new branch]              gh/anijain2305/941/head     -> origin/gh/anijain2305/941/head
2025-12-04T09:17:18.1678740Z  * [new branch]              gh/anijain2305/941/orig     -> origin/gh/anijain2305/941/orig
2025-12-04T09:17:18.1681302Z  * [new branch]              gh/anijain2305/942/base     -> origin/gh/anijain2305/942/base
2025-12-04T09:17:18.1683176Z  * [new branch]              gh/anijain2305/942/head     -> origin/gh/anijain2305/942/head
2025-12-04T09:17:18.1685081Z  * [new branch]              gh/anijain2305/942/orig     -> origin/gh/anijain2305/942/orig
2025-12-04T09:17:18.1687647Z  * [new branch]              gh/anijain2305/943/base     -> origin/gh/anijain2305/943/base
2025-12-04T09:17:18.1689489Z  * [new branch]              gh/anijain2305/943/head     -> origin/gh/anijain2305/943/head
2025-12-04T09:17:18.1691281Z  * [new branch]              gh/anijain2305/943/orig     -> origin/gh/anijain2305/943/orig
2025-12-04T09:17:18.1694515Z  * [new branch]              gh/anijain2305/944/base     -> origin/gh/anijain2305/944/base
2025-12-04T09:17:18.1696352Z  * [new branch]              gh/anijain2305/944/head     -> origin/gh/anijain2305/944/head
2025-12-04T09:17:18.1698623Z  * [new branch]              gh/anijain2305/944/orig     -> origin/gh/anijain2305/944/orig
2025-12-04T09:17:18.1701436Z  * [new branch]              gh/anijain2305/945/base     -> origin/gh/anijain2305/945/base
2025-12-04T09:17:18.1703285Z  * [new branch]              gh/anijain2305/945/head     -> origin/gh/anijain2305/945/head
2025-12-04T09:17:18.1705223Z  * [new branch]              gh/anijain2305/945/orig     -> origin/gh/anijain2305/945/orig
2025-12-04T09:17:18.1708149Z  * [new branch]              gh/anijain2305/946/base     -> origin/gh/anijain2305/946/base
2025-12-04T09:17:18.1710068Z  * [new branch]              gh/anijain2305/946/head     -> origin/gh/anijain2305/946/head
2025-12-04T09:17:18.1712017Z  * [new branch]              gh/anijain2305/946/orig     -> origin/gh/anijain2305/946/orig
2025-12-04T09:17:18.1714628Z  * [new branch]              gh/anijain2305/947/base     -> origin/gh/anijain2305/947/base
2025-12-04T09:17:18.1716050Z  * [new branch]              gh/anijain2305/947/head     -> origin/gh/anijain2305/947/head
2025-12-04T09:17:18.1718025Z  * [new branch]              gh/anijain2305/947/orig     -> origin/gh/anijain2305/947/orig
2025-12-04T09:17:18.1720885Z  * [new branch]              gh/anijain2305/948/base     -> origin/gh/anijain2305/948/base
2025-12-04T09:17:18.1722458Z  * [new branch]              gh/anijain2305/948/head     -> origin/gh/anijain2305/948/head
2025-12-04T09:17:18.1724395Z  * [new branch]              gh/anijain2305/948/orig     -> origin/gh/anijain2305/948/orig
2025-12-04T09:17:18.1727219Z  * [new branch]              gh/anijain2305/949/base     -> origin/gh/anijain2305/949/base
2025-12-04T09:17:18.1729124Z  * [new branch]              gh/anijain2305/949/head     -> origin/gh/anijain2305/949/head
2025-12-04T09:17:18.1730986Z  * [new branch]              gh/anijain2305/949/orig     -> origin/gh/anijain2305/949/orig
2025-12-04T09:17:18.1733572Z  * [new branch]              gh/anijain2305/950/base     -> origin/gh/anijain2305/950/base
2025-12-04T09:17:18.1735430Z  * [new branch]              gh/anijain2305/950/head     -> origin/gh/anijain2305/950/head
2025-12-04T09:17:18.1737029Z  * [new branch]              gh/anijain2305/950/orig     -> origin/gh/anijain2305/950/orig
2025-12-04T09:17:18.1740077Z  * [new branch]              gh/anijain2305/951/base     -> origin/gh/anijain2305/951/base
2025-12-04T09:17:18.1741601Z  * [new branch]              gh/anijain2305/951/head     -> origin/gh/anijain2305/951/head
2025-12-04T09:17:18.1743673Z  * [new branch]              gh/anijain2305/951/orig     -> origin/gh/anijain2305/951/orig
2025-12-04T09:17:18.1746396Z  * [new branch]              gh/anijain2305/952/base     -> origin/gh/anijain2305/952/base
2025-12-04T09:17:18.1748230Z  * [new branch]              gh/anijain2305/952/head     -> origin/gh/anijain2305/952/head
2025-12-04T09:17:18.1750044Z  * [new branch]              gh/anijain2305/952/orig     -> origin/gh/anijain2305/952/orig
2025-12-04T09:17:18.1752632Z  * [new branch]              gh/anijain2305/953/base     -> origin/gh/anijain2305/953/base
2025-12-04T09:17:18.1754494Z  * [new branch]              gh/anijain2305/953/head     -> origin/gh/anijain2305/953/head
2025-12-04T09:17:18.1756329Z  * [new branch]              gh/anijain2305/953/orig     -> origin/gh/anijain2305/953/orig
2025-12-04T09:17:18.1758907Z  * [new branch]              gh/anijain2305/954/base     -> origin/gh/anijain2305/954/base
2025-12-04T09:17:18.1760910Z  * [new branch]              gh/anijain2305/954/head     -> origin/gh/anijain2305/954/head
2025-12-04T09:17:18.1762732Z  * [new branch]              gh/anijain2305/954/orig     -> origin/gh/anijain2305/954/orig
2025-12-04T09:17:18.1765435Z  * [new branch]              gh/anijain2305/955/base     -> origin/gh/anijain2305/955/base
2025-12-04T09:17:18.1767282Z  * [new branch]              gh/anijain2305/955/head     -> origin/gh/anijain2305/955/head
2025-12-04T09:17:18.1769122Z  * [new branch]              gh/anijain2305/955/orig     -> origin/gh/anijain2305/955/orig
2025-12-04T09:17:18.1771943Z  * [new branch]              gh/anijain2305/956/base     -> origin/gh/anijain2305/956/base
2025-12-04T09:17:18.1773758Z  * [new branch]              gh/anijain2305/956/head     -> origin/gh/anijain2305/956/head
2025-12-04T09:17:18.1775578Z  * [new branch]              gh/anijain2305/956/orig     -> origin/gh/anijain2305/956/orig
2025-12-04T09:17:18.1778596Z  * [new branch]              gh/anijain2305/957/base     -> origin/gh/anijain2305/957/base
2025-12-04T09:17:18.1780641Z  * [new branch]              gh/anijain2305/957/head     -> origin/gh/anijain2305/957/head
2025-12-04T09:17:18.1782472Z  * [new branch]              gh/anijain2305/957/orig     -> origin/gh/anijain2305/957/orig
2025-12-04T09:17:18.1785639Z  * [new branch]              gh/anijain2305/958/base     -> origin/gh/anijain2305/958/base
2025-12-04T09:17:18.1787642Z  * [new branch]              gh/anijain2305/958/head     -> origin/gh/anijain2305/958/head
2025-12-04T09:17:18.1789126Z  * [new branch]              gh/anijain2305/958/orig     -> origin/gh/anijain2305/958/orig
2025-12-04T09:17:18.1791891Z  * [new branch]              gh/anijain2305/959/base     -> origin/gh/anijain2305/959/base
2025-12-04T09:17:18.1793617Z  * [new branch]              gh/anijain2305/959/head     -> origin/gh/anijain2305/959/head
2025-12-04T09:17:18.1795561Z  * [new branch]              gh/anijain2305/959/orig     -> origin/gh/anijain2305/959/orig
2025-12-04T09:17:18.1798357Z  * [new branch]              gh/anijain2305/960/base     -> origin/gh/anijain2305/960/base
2025-12-04T09:17:18.1800247Z  * [new branch]              gh/anijain2305/960/head     -> origin/gh/anijain2305/960/head
2025-12-04T09:17:18.1802113Z  * [new branch]              gh/anijain2305/960/orig     -> origin/gh/anijain2305/960/orig
2025-12-04T09:17:18.1804899Z  * [new branch]              gh/anijain2305/961/base     -> origin/gh/anijain2305/961/base
2025-12-04T09:17:18.1806703Z  * [new branch]              gh/anijain2305/961/head     -> origin/gh/anijain2305/961/head
2025-12-04T09:17:18.1808467Z  * [new branch]              gh/anijain2305/961/orig     -> origin/gh/anijain2305/961/orig
2025-12-04T09:17:18.1813311Z  * [new branch]              gh/anijain2305/962/base     -> origin/gh/anijain2305/962/base
2025-12-04T09:17:18.1814695Z  * [new branch]              gh/anijain2305/962/head     -> origin/gh/anijain2305/962/head
2025-12-04T09:17:18.1816812Z  * [new branch]              gh/anijain2305/962/orig     -> origin/gh/anijain2305/962/orig
2025-12-04T09:17:18.1819892Z  * [new branch]              gh/anijain2305/963/base     -> origin/gh/anijain2305/963/base
2025-12-04T09:17:18.1821648Z  * [new branch]              gh/anijain2305/963/head     -> origin/gh/anijain2305/963/head
2025-12-04T09:17:18.1823835Z  * [new branch]              gh/anijain2305/963/orig     -> origin/gh/anijain2305/963/orig
2025-12-04T09:17:18.1826543Z  * [new branch]              gh/anijain2305/964/base     -> origin/gh/anijain2305/964/base
2025-12-04T09:17:18.1827981Z  * [new branch]              gh/anijain2305/964/head     -> origin/gh/anijain2305/964/head
2025-12-04T09:17:18.1830059Z  * [new branch]              gh/anijain2305/964/orig     -> origin/gh/anijain2305/964/orig
2025-12-04T09:17:18.1832754Z  * [new branch]              gh/anijain2305/965/base     -> origin/gh/anijain2305/965/base
2025-12-04T09:17:18.1835011Z  * [new branch]              gh/anijain2305/965/head     -> origin/gh/anijain2305/965/head
2025-12-04T09:17:18.1837851Z  * [new branch]              gh/anijain2305/965/orig     -> origin/gh/anijain2305/965/orig
2025-12-04T09:17:18.1841353Z  * [new branch]              gh/anijain2305/966/base     -> origin/gh/anijain2305/966/base
2025-12-04T09:17:18.1844178Z  * [new branch]              gh/anijain2305/966/head     -> origin/gh/anijain2305/966/head
2025-12-04T09:17:18.1846822Z  * [new branch]              gh/anijain2305/966/orig     -> origin/gh/anijain2305/966/orig
2025-12-04T09:17:18.1850325Z  * [new branch]              gh/anijain2305/967/base     -> origin/gh/anijain2305/967/base
2025-12-04T09:17:18.1852820Z  * [new branch]              gh/anijain2305/967/head     -> origin/gh/anijain2305/967/head
2025-12-04T09:17:18.1855485Z  * [new branch]              gh/anijain2305/967/orig     -> origin/gh/anijain2305/967/orig
2025-12-04T09:17:18.1858836Z  * [new branch]              gh/anijain2305/968/base     -> origin/gh/anijain2305/968/base
2025-12-04T09:17:18.1861508Z  * [new branch]              gh/anijain2305/968/head     -> origin/gh/anijain2305/968/head
2025-12-04T09:17:18.1863882Z  * [new branch]              gh/anijain2305/968/orig     -> origin/gh/anijain2305/968/orig
2025-12-04T09:17:18.1867516Z  * [new branch]              gh/anijain2305/969/base     -> origin/gh/anijain2305/969/base
2025-12-04T09:17:18.1869989Z  * [new branch]              gh/anijain2305/969/head     -> origin/gh/anijain2305/969/head
2025-12-04T09:17:18.1873018Z  * [new branch]              gh/anijain2305/969/orig     -> origin/gh/anijain2305/969/orig
2025-12-04T09:17:18.1876212Z  * [new branch]              gh/anijain2305/970/base     -> origin/gh/anijain2305/970/base
2025-12-04T09:17:18.1877654Z  * [new branch]              gh/anijain2305/970/head     -> origin/gh/anijain2305/970/head
2025-12-04T09:17:18.1879591Z  * [new branch]              gh/anijain2305/970/orig     -> origin/gh/anijain2305/970/orig
2025-12-04T09:17:18.1883055Z  * [new branch]              gh/anjali411/216/base       -> origin/gh/anjali411/216/base
2025-12-04T09:17:18.1884848Z  * [new branch]              gh/anjali411/216/head       -> origin/gh/anjali411/216/head
2025-12-04T09:17:18.1886904Z  * [new branch]              gh/anjali411/216/orig       -> origin/gh/anjali411/216/orig
2025-12-04T09:17:18.1890246Z  * [new branch]              gh/anshul-si/1/base         -> origin/gh/anshul-si/1/base
2025-12-04T09:17:18.1891740Z  * [new branch]              gh/anshul-si/1/head         -> origin/gh/anshul-si/1/head
2025-12-04T09:17:18.1894337Z  * [new branch]              gh/anshul-si/2/base         -> origin/gh/anshul-si/2/base
2025-12-04T09:17:18.1895875Z  * [new branch]              gh/anshul-si/2/head         -> origin/gh/anshul-si/2/head
2025-12-04T09:17:18.1898484Z  * [new branch]              gh/anshul-si/3/base         -> origin/gh/anshul-si/3/base
2025-12-04T09:17:18.1900369Z  * [new branch]              gh/anshul-si/3/head         -> origin/gh/anshul-si/3/head
2025-12-04T09:17:18.1902817Z  * [new branch]              gh/anshul-si/4/base         -> origin/gh/anshul-si/4/base
2025-12-04T09:17:18.1904274Z  * [new branch]              gh/anshul-si/4/head         -> origin/gh/anshul-si/4/head
2025-12-04T09:17:18.1907003Z  * [new branch]              gh/anshul-si/5/base         -> origin/gh/anshul-si/5/base
2025-12-04T09:17:18.1908662Z  * [new branch]              gh/anshul-si/5/head         -> origin/gh/anshul-si/5/head
2025-12-04T09:17:18.1911792Z  * [new branch]              gh/anshul-si/53/base        -> origin/gh/anshul-si/53/base
2025-12-04T09:17:18.1913516Z  * [new branch]              gh/anshul-si/53/head        -> origin/gh/anshul-si/53/head
2025-12-04T09:17:18.1916339Z  * [new branch]              gh/anshul-si/58/base        -> origin/gh/anshul-si/58/base
2025-12-04T09:17:18.1917809Z  * [new branch]              gh/anshul-si/58/head        -> origin/gh/anshul-si/58/head
2025-12-04T09:17:18.1920530Z  * [new branch]              gh/anshul-si/66/base        -> origin/gh/anshul-si/66/base
2025-12-04T09:17:18.1922238Z  * [new branch]              gh/anshul-si/66/head        -> origin/gh/anshul-si/66/head
2025-12-04T09:17:18.1924261Z  * [new branch]              gh/anshul-si/66/orig        -> origin/gh/anshul-si/66/orig
2025-12-04T09:17:18.1926662Z  * [new branch]              gh/anshul-si/67/base        -> origin/gh/anshul-si/67/base
2025-12-04T09:17:18.1928144Z  * [new branch]              gh/anshul-si/67/head        -> origin/gh/anshul-si/67/head
2025-12-04T09:17:18.1930198Z  * [new branch]              gh/anshul-si/67/orig        -> origin/gh/anshul-si/67/orig
2025-12-04T09:17:18.1933004Z  * [new branch]              gh/anshul-si/68/base        -> origin/gh/anshul-si/68/base
2025-12-04T09:17:18.1934716Z  * [new branch]              gh/anshul-si/68/head        -> origin/gh/anshul-si/68/head
2025-12-04T09:17:18.1936755Z  * [new branch]              gh/anshul-si/68/orig        -> origin/gh/anshul-si/68/orig
2025-12-04T09:17:18.1939652Z  * [new branch]              gh/anshul-si/69/base        -> origin/gh/anshul-si/69/base
2025-12-04T09:17:18.1941112Z  * [new branch]              gh/anshul-si/69/head        -> origin/gh/anshul-si/69/head
2025-12-04T09:17:18.1943232Z  * [new branch]              gh/anshul-si/69/orig        -> origin/gh/anshul-si/69/orig
2025-12-04T09:17:18.1945873Z  * [new branch]              gh/anshul-si/70/base        -> origin/gh/anshul-si/70/base
2025-12-04T09:17:18.1947340Z  * [new branch]              gh/anshul-si/70/head        -> origin/gh/anshul-si/70/head
2025-12-04T09:17:18.1949678Z  * [new branch]              gh/anshul-si/70/orig        -> origin/gh/anshul-si/70/orig
2025-12-04T09:17:18.1952085Z  * [new branch]              gh/anshul-si/71/base        -> origin/gh/anshul-si/71/base
2025-12-04T09:17:18.1953805Z  * [new branch]              gh/anshul-si/71/head        -> origin/gh/anshul-si/71/head
2025-12-04T09:17:18.1955515Z  * [new branch]              gh/anshul-si/71/orig        -> origin/gh/anshul-si/71/orig
2025-12-04T09:17:18.1958313Z  * [new branch]              gh/anshul-si/72/base        -> origin/gh/anshul-si/72/base
2025-12-04T09:17:18.1960418Z  * [new branch]              gh/anshul-si/72/head        -> origin/gh/anshul-si/72/head
2025-12-04T09:17:18.1961899Z  * [new branch]              gh/anshul-si/72/orig        -> origin/gh/anshul-si/72/orig
2025-12-04T09:17:18.1964647Z  * [new branch]              gh/anshul-si/73/base        -> origin/gh/anshul-si/73/base
2025-12-04T09:17:18.1966667Z  * [new branch]              gh/anshul-si/73/head        -> origin/gh/anshul-si/73/head
2025-12-04T09:17:18.1968138Z  * [new branch]              gh/anshul-si/73/orig        -> origin/gh/anshul-si/73/orig
2025-12-04T09:17:18.1971573Z  * [new branch]              gh/aorenste/132/base        -> origin/gh/aorenste/132/base
2025-12-04T09:17:18.1973303Z  * [new branch]              gh/aorenste/132/head        -> origin/gh/aorenste/132/head
2025-12-04T09:17:18.1976223Z  * [new branch]              gh/aorenste/134/base        -> origin/gh/aorenste/134/base
2025-12-04T09:17:18.1978383Z  * [new branch]              gh/aorenste/134/head        -> origin/gh/aorenste/134/head
2025-12-04T09:17:18.1980507Z  * [new branch]              gh/aorenste/134/orig        -> origin/gh/aorenste/134/orig
2025-12-04T09:17:18.1983123Z  * [new branch]              gh/aorenste/139/base        -> origin/gh/aorenste/139/base
2025-12-04T09:17:18.1984846Z  * [new branch]              gh/aorenste/139/head        -> origin/gh/aorenste/139/head
2025-12-04T09:17:18.1986913Z  * [new branch]              gh/aorenste/139/orig        -> origin/gh/aorenste/139/orig
2025-12-04T09:17:18.1989453Z  * [new branch]              gh/aorenste/141/base        -> origin/gh/aorenste/141/base
2025-12-04T09:17:18.1991007Z  * [new branch]              gh/aorenste/141/head        -> origin/gh/aorenste/141/head
2025-12-04T09:17:18.1994021Z  * [new branch]              gh/aorenste/145/base        -> origin/gh/aorenste/145/base
2025-12-04T09:17:18.1995962Z  * [new branch]              gh/aorenste/145/head        -> origin/gh/aorenste/145/head
2025-12-04T09:17:18.1997977Z  * [new branch]              gh/aorenste/145/orig        -> origin/gh/aorenste/145/orig
2025-12-04T09:17:18.2000543Z  * [new branch]              gh/aorenste/146/base        -> origin/gh/aorenste/146/base
2025-12-04T09:17:18.2002500Z  * [new branch]              gh/aorenste/146/head        -> origin/gh/aorenste/146/head
2025-12-04T09:17:18.2004502Z  * [new branch]              gh/aorenste/146/orig        -> origin/gh/aorenste/146/orig
2025-12-04T09:17:18.2007066Z  * [new branch]              gh/aorenste/147/base        -> origin/gh/aorenste/147/base
2025-12-04T09:17:18.2008807Z  * [new branch]              gh/aorenste/147/head        -> origin/gh/aorenste/147/head
2025-12-04T09:17:18.2011095Z  * [new branch]              gh/aorenste/147/orig        -> origin/gh/aorenste/147/orig
2025-12-04T09:17:18.2013710Z  * [new branch]              gh/aorenste/148/base        -> origin/gh/aorenste/148/base
2025-12-04T09:17:18.2015231Z  * [new branch]              gh/aorenste/148/head        -> origin/gh/aorenste/148/head
2025-12-04T09:17:18.2017340Z  * [new branch]              gh/aorenste/148/orig        -> origin/gh/aorenste/148/orig
2025-12-04T09:17:18.2020133Z  * [new branch]              gh/aorenste/149/base        -> origin/gh/aorenste/149/base
2025-12-04T09:17:18.2021556Z  * [new branch]              gh/aorenste/149/head        -> origin/gh/aorenste/149/head
2025-12-04T09:17:18.2023677Z  * [new branch]              gh/aorenste/149/orig        -> origin/gh/aorenste/149/orig
2025-12-04T09:17:18.2026476Z  * [new branch]              gh/aorenste/150/base        -> origin/gh/aorenste/150/base
2025-12-04T09:17:18.2027768Z  * [new branch]              gh/aorenste/150/head        -> origin/gh/aorenste/150/head
2025-12-04T09:17:18.2029792Z  * [new branch]              gh/aorenste/150/orig        -> origin/gh/aorenste/150/orig
2025-12-04T09:17:18.2032315Z  * [new branch]              gh/aorenste/151/base        -> origin/gh/aorenste/151/base
2025-12-04T09:17:18.2033865Z  * [new branch]              gh/aorenste/151/head        -> origin/gh/aorenste/151/head
2025-12-04T09:17:18.2036094Z  * [new branch]              gh/aorenste/151/orig        -> origin/gh/aorenste/151/orig
2025-12-04T09:17:18.2038668Z  * [new branch]              gh/aorenste/152/base        -> origin/gh/aorenste/152/base
2025-12-04T09:17:18.2040206Z  * [new branch]              gh/aorenste/152/head        -> origin/gh/aorenste/152/head
2025-12-04T09:17:18.2042257Z  * [new branch]              gh/aorenste/152/orig        -> origin/gh/aorenste/152/orig
2025-12-04T09:17:18.2044733Z  * [new branch]              gh/aorenste/153/base        -> origin/gh/aorenste/153/base
2025-12-04T09:17:18.2046228Z  * [new branch]              gh/aorenste/153/head        -> origin/gh/aorenste/153/head
2025-12-04T09:17:18.2048340Z  * [new branch]              gh/aorenste/153/orig        -> origin/gh/aorenste/153/orig
2025-12-04T09:17:18.2050956Z  * [new branch]              gh/aorenste/154/base        -> origin/gh/aorenste/154/base
2025-12-04T09:17:18.2052395Z  * [new branch]              gh/aorenste/154/head        -> origin/gh/aorenste/154/head
2025-12-04T09:17:18.2054558Z  * [new branch]              gh/aorenste/154/orig        -> origin/gh/aorenste/154/orig
2025-12-04T09:17:18.2056868Z  * [new branch]              gh/aorenste/155/base        -> origin/gh/aorenste/155/base
2025-12-04T09:17:18.2058417Z  * [new branch]              gh/aorenste/155/head        -> origin/gh/aorenste/155/head
2025-12-04T09:17:18.2060748Z  * [new branch]              gh/aorenste/155/orig        -> origin/gh/aorenste/155/orig
2025-12-04T09:17:18.2062996Z  * [new branch]              gh/aorenste/156/base        -> origin/gh/aorenste/156/base
2025-12-04T09:17:18.2064770Z  * [new branch]              gh/aorenste/156/head        -> origin/gh/aorenste/156/head
2025-12-04T09:17:18.2066806Z  * [new branch]              gh/aorenste/156/orig        -> origin/gh/aorenste/156/orig
2025-12-04T09:17:18.2069648Z  * [new branch]              gh/aorenste/157/base        -> origin/gh/aorenste/157/base
2025-12-04T09:17:18.2071372Z  * [new branch]              gh/aorenste/157/head        -> origin/gh/aorenste/157/head
2025-12-04T09:17:18.2073293Z  * [new branch]              gh/aorenste/157/orig        -> origin/gh/aorenste/157/orig
2025-12-04T09:17:18.2075719Z  * [new branch]              gh/aorenste/158/base        -> origin/gh/aorenste/158/base
2025-12-04T09:17:18.2077443Z  * [new branch]              gh/aorenste/158/head        -> origin/gh/aorenste/158/head
2025-12-04T09:17:18.2079435Z  * [new branch]              gh/aorenste/158/orig        -> origin/gh/aorenste/158/orig
2025-12-04T09:17:18.2081855Z  * [new branch]              gh/aorenste/159/base        -> origin/gh/aorenste/159/base
2025-12-04T09:17:18.2083582Z  * [new branch]              gh/aorenste/159/head        -> origin/gh/aorenste/159/head
2025-12-04T09:17:18.2085517Z  * [new branch]              gh/aorenste/159/orig        -> origin/gh/aorenste/159/orig
2025-12-04T09:17:18.2088724Z  * [new branch]              gh/avikchaudhuri/1/base     -> origin/gh/avikchaudhuri/1/base
2025-12-04T09:17:18.2090551Z  * [new branch]              gh/avikchaudhuri/1/head     -> origin/gh/avikchaudhuri/1/head
2025-12-04T09:17:18.2093015Z  * [new branch]              gh/avikchaudhuri/2/base     -> origin/gh/avikchaudhuri/2/base
2025-12-04T09:17:18.2094464Z  * [new branch]              gh/avikchaudhuri/2/head     -> origin/gh/avikchaudhuri/2/head
2025-12-04T09:17:18.2096506Z  * [new branch]              gh/avikchaudhuri/2/orig     -> origin/gh/avikchaudhuri/2/orig
2025-12-04T09:17:18.2100255Z  * [new branch]              gh/bdhirsh/666/base         -> origin/gh/bdhirsh/666/base
2025-12-04T09:17:18.2101576Z  * [new branch]              gh/bdhirsh/666/head         -> origin/gh/bdhirsh/666/head
2025-12-04T09:17:18.2103815Z  * [new branch]              gh/bdhirsh/666/orig         -> origin/gh/bdhirsh/666/orig
2025-12-04T09:17:18.2106234Z  * [new branch]              gh/bdhirsh/668/base         -> origin/gh/bdhirsh/668/base
2025-12-04T09:17:18.2108476Z  * [new branch]              gh/bdhirsh/668/head         -> origin/gh/bdhirsh/668/head
2025-12-04T09:17:18.2109992Z  * [new branch]              gh/bdhirsh/668/orig         -> origin/gh/bdhirsh/668/orig
2025-12-04T09:17:18.2112822Z  * [new branch]              gh/bdhirsh/669/base         -> origin/gh/bdhirsh/669/base
2025-12-04T09:17:18.2114456Z  * [new branch]              gh/bdhirsh/669/head         -> origin/gh/bdhirsh/669/head
2025-12-04T09:17:18.2116267Z  * [new branch]              gh/bdhirsh/669/orig         -> origin/gh/bdhirsh/669/orig
2025-12-04T09:17:18.2118976Z  * [new branch]              gh/bdhirsh/670/base         -> origin/gh/bdhirsh/670/base
2025-12-04T09:17:18.2121231Z  * [new branch]              gh/bdhirsh/670/head         -> origin/gh/bdhirsh/670/head
2025-12-04T09:17:18.2123384Z  * [new branch]              gh/bdhirsh/670/orig         -> origin/gh/bdhirsh/670/orig
2025-12-04T09:17:18.2125817Z  * [new branch]              gh/bdhirsh/672/base         -> origin/gh/bdhirsh/672/base
2025-12-04T09:17:18.2127721Z  * [new branch]              gh/bdhirsh/672/head         -> origin/gh/bdhirsh/672/head
2025-12-04T09:17:18.2129701Z  * [new branch]              gh/bdhirsh/672/orig         -> origin/gh/bdhirsh/672/orig
2025-12-04T09:17:18.2132205Z  * [new branch]              gh/bdhirsh/675/base         -> origin/gh/bdhirsh/675/base
2025-12-04T09:17:18.2134248Z  * [new branch]              gh/bdhirsh/675/head         -> origin/gh/bdhirsh/675/head
2025-12-04T09:17:18.2136051Z  * [new branch]              gh/bdhirsh/675/orig         -> origin/gh/bdhirsh/675/orig
2025-12-04T09:17:18.2138564Z  * [new branch]              gh/bdhirsh/676/base         -> origin/gh/bdhirsh/676/base
2025-12-04T09:17:18.2140725Z  * [new branch]              gh/bdhirsh/676/head         -> origin/gh/bdhirsh/676/head
2025-12-04T09:17:18.2142518Z  * [new branch]              gh/bdhirsh/676/orig         -> origin/gh/bdhirsh/676/orig
2025-12-04T09:17:18.2145017Z  * [new branch]              gh/bdhirsh/677/base         -> origin/gh/bdhirsh/677/base
2025-12-04T09:17:18.2147206Z  * [new branch]              gh/bdhirsh/677/head         -> origin/gh/bdhirsh/677/head
2025-12-04T09:17:18.2149074Z  * [new branch]              gh/bdhirsh/677/orig         -> origin/gh/bdhirsh/677/orig
2025-12-04T09:17:18.2151780Z  * [new branch]              gh/bdhirsh/678/base         -> origin/gh/bdhirsh/678/base
2025-12-04T09:17:18.2153736Z  * [new branch]              gh/bdhirsh/678/head         -> origin/gh/bdhirsh/678/head
2025-12-04T09:17:18.2155591Z  * [new branch]              gh/bdhirsh/678/orig         -> origin/gh/bdhirsh/678/orig
2025-12-04T09:17:18.2158271Z  * [new branch]              gh/bdhirsh/679/base         -> origin/gh/bdhirsh/679/base
2025-12-04T09:17:18.2160307Z  * [new branch]              gh/bdhirsh/679/head         -> origin/gh/bdhirsh/679/head
2025-12-04T09:17:18.2162163Z  * [new branch]              gh/bdhirsh/679/orig         -> origin/gh/bdhirsh/679/orig
2025-12-04T09:17:18.2164769Z  * [new branch]              gh/bdhirsh/680/base         -> origin/gh/bdhirsh/680/base
2025-12-04T09:17:18.2166608Z  * [new branch]              gh/bdhirsh/680/head         -> origin/gh/bdhirsh/680/head
2025-12-04T09:17:18.2168469Z  * [new branch]              gh/bdhirsh/680/orig         -> origin/gh/bdhirsh/680/orig
2025-12-04T09:17:18.2170806Z  * [new branch]              gh/bdhirsh/681/base         -> origin/gh/bdhirsh/681/base
2025-12-04T09:17:18.2172798Z  * [new branch]              gh/bdhirsh/681/head         -> origin/gh/bdhirsh/681/head
2025-12-04T09:17:18.2174774Z  * [new branch]              gh/bdhirsh/681/orig         -> origin/gh/bdhirsh/681/orig
2025-12-04T09:17:18.2177712Z  * [new branch]              gh/benjaminglass1/101/base  -> origin/gh/benjaminglass1/101/base
2025-12-04T09:17:18.2179713Z  * [new branch]              gh/benjaminglass1/101/head  -> origin/gh/benjaminglass1/101/head
2025-12-04T09:17:18.2181600Z  * [new branch]              gh/benjaminglass1/101/orig  -> origin/gh/benjaminglass1/101/orig
2025-12-04T09:17:18.2184441Z  * [new branch]              gh/benjaminglass1/102/base  -> origin/gh/benjaminglass1/102/base
2025-12-04T09:17:18.2186149Z  * [new branch]              gh/benjaminglass1/102/head  -> origin/gh/benjaminglass1/102/head
2025-12-04T09:17:18.2187976Z  * [new branch]              gh/benjaminglass1/102/orig  -> origin/gh/benjaminglass1/102/orig
2025-12-04T09:17:18.2190556Z  * [new branch]              gh/benjaminglass1/106/base  -> origin/gh/benjaminglass1/106/base
2025-12-04T09:17:18.2192405Z  * [new branch]              gh/benjaminglass1/106/head  -> origin/gh/benjaminglass1/106/head
2025-12-04T09:17:18.2194065Z  * [new branch]              gh/benjaminglass1/106/orig  -> origin/gh/benjaminglass1/106/orig
2025-12-04T09:17:18.2196609Z  * [new branch]              gh/benjaminglass1/107/base  -> origin/gh/benjaminglass1/107/base
2025-12-04T09:17:18.2198448Z  * [new branch]              gh/benjaminglass1/107/head  -> origin/gh/benjaminglass1/107/head
2025-12-04T09:17:18.2200313Z  * [new branch]              gh/benjaminglass1/107/orig  -> origin/gh/benjaminglass1/107/orig
2025-12-04T09:17:18.2202808Z  * [new branch]              gh/benjaminglass1/108/base  -> origin/gh/benjaminglass1/108/base
2025-12-04T09:17:18.2204652Z  * [new branch]              gh/benjaminglass1/108/head  -> origin/gh/benjaminglass1/108/head
2025-12-04T09:17:18.2206459Z  * [new branch]              gh/benjaminglass1/108/orig  -> origin/gh/benjaminglass1/108/orig
2025-12-04T09:17:18.2209423Z  * [new branch]              gh/benjaminglass1/109/base  -> origin/gh/benjaminglass1/109/base
2025-12-04T09:17:18.2211128Z  * [new branch]              gh/benjaminglass1/109/head  -> origin/gh/benjaminglass1/109/head
2025-12-04T09:17:18.2212956Z  * [new branch]              gh/benjaminglass1/109/orig  -> origin/gh/benjaminglass1/109/orig
2025-12-04T09:17:18.2215497Z  * [new branch]              gh/benjaminglass1/97/base   -> origin/gh/benjaminglass1/97/base
2025-12-04T09:17:18.2217294Z  * [new branch]              gh/benjaminglass1/97/head   -> origin/gh/benjaminglass1/97/head
2025-12-04T09:17:18.2219258Z  * [new branch]              gh/benjaminglass1/97/orig   -> origin/gh/benjaminglass1/97/orig
2025-12-04T09:17:18.2222925Z  * [new branch]              gh/bobrenjc93/570/base      -> origin/gh/bobrenjc93/570/base
2025-12-04T09:17:18.2225122Z  * [new branch]              gh/bobrenjc93/570/head      -> origin/gh/bobrenjc93/570/head
2025-12-04T09:17:18.2226627Z  * [new branch]              gh/bobrenjc93/570/orig      -> origin/gh/bobrenjc93/570/orig
2025-12-04T09:17:18.2229268Z  * [new branch]              gh/bobrenjc93/604/base      -> origin/gh/bobrenjc93/604/base
2025-12-04T09:17:18.2231195Z  * [new branch]              gh/bobrenjc93/604/head      -> origin/gh/bobrenjc93/604/head
2025-12-04T09:17:18.2233016Z  * [new branch]              gh/bobrenjc93/604/orig      -> origin/gh/bobrenjc93/604/orig
2025-12-04T09:17:18.2235675Z  * [new branch]              gh/bobrenjc93/638/base      -> origin/gh/bobrenjc93/638/base
2025-12-04T09:17:18.2237480Z  * [new branch]              gh/bobrenjc93/638/head      -> origin/gh/bobrenjc93/638/head
2025-12-04T09:17:18.2239301Z  * [new branch]              gh/bobrenjc93/638/orig      -> origin/gh/bobrenjc93/638/orig
2025-12-04T09:17:18.2241808Z  * [new branch]              gh/bobrenjc93/653/base      -> origin/gh/bobrenjc93/653/base
2025-12-04T09:17:18.2243667Z  * [new branch]              gh/bobrenjc93/653/head      -> origin/gh/bobrenjc93/653/head
2025-12-04T09:17:18.2245503Z  * [new branch]              gh/bobrenjc93/653/orig      -> origin/gh/bobrenjc93/653/orig
2025-12-04T09:17:18.2248343Z  * [new branch]              gh/bobrenjc93/654/base      -> origin/gh/bobrenjc93/654/base
2025-12-04T09:17:18.2250171Z  * [new branch]              gh/bobrenjc93/654/head      -> origin/gh/bobrenjc93/654/head
2025-12-04T09:17:18.2251895Z  * [new branch]              gh/bobrenjc93/654/orig      -> origin/gh/bobrenjc93/654/orig
2025-12-04T09:17:18.2254448Z  * [new branch]              gh/bobrenjc93/657/base      -> origin/gh/bobrenjc93/657/base
2025-12-04T09:17:18.2256238Z  * [new branch]              gh/bobrenjc93/657/head      -> origin/gh/bobrenjc93/657/head
2025-12-04T09:17:18.2258029Z  * [new branch]              gh/bobrenjc93/657/orig      -> origin/gh/bobrenjc93/657/orig
2025-12-04T09:17:18.2261061Z  * [new branch]              gh/bobrenjc93/672/base      -> origin/gh/bobrenjc93/672/base
2025-12-04T09:17:18.2262678Z  * [new branch]              gh/bobrenjc93/672/head      -> origin/gh/bobrenjc93/672/head
2025-12-04T09:17:18.2264477Z  * [new branch]              gh/bobrenjc93/672/orig      -> origin/gh/bobrenjc93/672/orig
2025-12-04T09:17:18.2267029Z  * [new branch]              gh/bobrenjc93/679/base      -> origin/gh/bobrenjc93/679/base
2025-12-04T09:17:18.2269161Z  * [new branch]              gh/bobrenjc93/679/head      -> origin/gh/bobrenjc93/679/head
2025-12-04T09:17:18.2270919Z  * [new branch]              gh/bobrenjc93/679/orig      -> origin/gh/bobrenjc93/679/orig
2025-12-04T09:17:18.2273439Z  * [new branch]              gh/bobrenjc93/680/base      -> origin/gh/bobrenjc93/680/base
2025-12-04T09:17:18.2275324Z  * [new branch]              gh/bobrenjc93/680/head      -> origin/gh/bobrenjc93/680/head
2025-12-04T09:17:18.2277128Z  * [new branch]              gh/bobrenjc93/680/orig      -> origin/gh/bobrenjc93/680/orig
2025-12-04T09:17:18.2279544Z  * [new branch]              gh/bobrenjc93/681/base      -> origin/gh/bobrenjc93/681/base
2025-12-04T09:17:18.2281494Z  * [new branch]              gh/bobrenjc93/681/head      -> origin/gh/bobrenjc93/681/head
2025-12-04T09:17:18.2283289Z  * [new branch]              gh/bobrenjc93/681/orig      -> origin/gh/bobrenjc93/681/orig
2025-12-04T09:17:18.2285679Z  * [new branch]              gh/bobrenjc93/682/base      -> origin/gh/bobrenjc93/682/base
2025-12-04T09:17:18.2287618Z  * [new branch]              gh/bobrenjc93/682/head      -> origin/gh/bobrenjc93/682/head
2025-12-04T09:17:18.2289417Z  * [new branch]              gh/bobrenjc93/682/orig      -> origin/gh/bobrenjc93/682/orig
2025-12-04T09:17:18.2292127Z  * [new branch]              gh/bobrenjc93/683/base      -> origin/gh/bobrenjc93/683/base
2025-12-04T09:17:18.2293661Z  * [new branch]              gh/bobrenjc93/683/head      -> origin/gh/bobrenjc93/683/head
2025-12-04T09:17:18.2295454Z  * [new branch]              gh/bobrenjc93/683/orig      -> origin/gh/bobrenjc93/683/orig
2025-12-04T09:17:18.2297972Z  * [new branch]              gh/bobrenjc93/684/base      -> origin/gh/bobrenjc93/684/base
2025-12-04T09:17:18.2300178Z  * [new branch]              gh/bobrenjc93/684/head      -> origin/gh/bobrenjc93/684/head
2025-12-04T09:17:18.2302193Z  * [new branch]              gh/bobrenjc93/684/orig      -> origin/gh/bobrenjc93/684/orig
2025-12-04T09:17:18.2304791Z  * [new branch]              gh/bobrenjc93/685/base      -> origin/gh/bobrenjc93/685/base
2025-12-04T09:17:18.2306554Z  * [new branch]              gh/bobrenjc93/685/head      -> origin/gh/bobrenjc93/685/head
2025-12-04T09:17:18.2308873Z  * [new branch]              gh/bobrenjc93/685/orig      -> origin/gh/bobrenjc93/685/orig
2025-12-04T09:17:18.2314281Z  * [new branch]              gh/bobrenjc93/686/base      -> origin/gh/bobrenjc93/686/base
2025-12-04T09:17:18.2316817Z  * [new branch]              gh/bobrenjc93/686/head      -> origin/gh/bobrenjc93/686/head
2025-12-04T09:17:18.2318908Z  * [new branch]              gh/bobrenjc93/686/orig      -> origin/gh/bobrenjc93/686/orig
2025-12-04T09:17:18.2320962Z  * [new branch]              gh/bobrenjc93/687/base      -> origin/gh/bobrenjc93/687/base
2025-12-04T09:17:18.2323322Z  * [new branch]              gh/bobrenjc93/687/head      -> origin/gh/bobrenjc93/687/head
2025-12-04T09:17:18.2325027Z  * [new branch]              gh/bobrenjc93/687/orig      -> origin/gh/bobrenjc93/687/orig
2025-12-04T09:17:18.2328038Z  * [new branch]              gh/bobrenjc93/688/base      -> origin/gh/bobrenjc93/688/base
2025-12-04T09:17:18.2329911Z  * [new branch]              gh/bobrenjc93/688/head      -> origin/gh/bobrenjc93/688/head
2025-12-04T09:17:18.2331848Z  * [new branch]              gh/bobrenjc93/688/orig      -> origin/gh/bobrenjc93/688/orig
2025-12-04T09:17:18.2334166Z  * [new branch]              gh/bobrenjc93/689/base      -> origin/gh/bobrenjc93/689/base
2025-12-04T09:17:18.2336088Z  * [new branch]              gh/bobrenjc93/689/head      -> origin/gh/bobrenjc93/689/head
2025-12-04T09:17:18.2337915Z  * [new branch]              gh/bobrenjc93/689/orig      -> origin/gh/bobrenjc93/689/orig
2025-12-04T09:17:18.2341071Z  * [new branch]              gh/bobrenjc93/690/base      -> origin/gh/bobrenjc93/690/base
2025-12-04T09:17:18.2342464Z  * [new branch]              gh/bobrenjc93/690/head      -> origin/gh/bobrenjc93/690/head
2025-12-04T09:17:18.2344328Z  * [new branch]              gh/bobrenjc93/690/orig      -> origin/gh/bobrenjc93/690/orig
2025-12-04T09:17:18.2347511Z  * [new branch]              gh/bobrenjc93/691/base      -> origin/gh/bobrenjc93/691/base
2025-12-04T09:17:18.2349599Z  * [new branch]              gh/bobrenjc93/691/head      -> origin/gh/bobrenjc93/691/head
2025-12-04T09:17:18.2351802Z  * [new branch]              gh/bobrenjc93/691/orig      -> origin/gh/bobrenjc93/691/orig
2025-12-04T09:17:18.2355076Z  * [new branch]              gh/bobrenjc93/692/base      -> origin/gh/bobrenjc93/692/base
2025-12-04T09:17:18.2356891Z  * [new branch]              gh/bobrenjc93/692/head      -> origin/gh/bobrenjc93/692/head
2025-12-04T09:17:18.2358700Z  * [new branch]              gh/bobrenjc93/692/orig      -> origin/gh/bobrenjc93/692/orig
2025-12-04T09:17:18.2361154Z  * [new branch]              gh/bobrenjc93/693/base      -> origin/gh/bobrenjc93/693/base
2025-12-04T09:17:18.2363025Z  * [new branch]              gh/bobrenjc93/693/head      -> origin/gh/bobrenjc93/693/head
2025-12-04T09:17:18.2364920Z  * [new branch]              gh/bobrenjc93/693/orig      -> origin/gh/bobrenjc93/693/orig
2025-12-04T09:17:18.2367625Z  * [new branch]              gh/bobrenjc93/694/base      -> origin/gh/bobrenjc93/694/base
2025-12-04T09:17:18.2369536Z  * [new branch]              gh/bobrenjc93/694/head      -> origin/gh/bobrenjc93/694/head
2025-12-04T09:17:18.2371346Z  * [new branch]              gh/bobrenjc93/694/orig      -> origin/gh/bobrenjc93/694/orig
2025-12-04T09:17:18.2373780Z  * [new branch]              gh/bobrenjc93/695/base      -> origin/gh/bobrenjc93/695/base
2025-12-04T09:17:18.2375581Z  * [new branch]              gh/bobrenjc93/695/head      -> origin/gh/bobrenjc93/695/head
2025-12-04T09:17:18.2377415Z  * [new branch]              gh/bobrenjc93/695/orig      -> origin/gh/bobrenjc93/695/orig
2025-12-04T09:17:18.2380838Z  * [new branch]              gh/c00w/23/base             -> origin/gh/c00w/23/base
2025-12-04T09:17:18.2382761Z  * [new branch]              gh/c00w/23/head             -> origin/gh/c00w/23/head
2025-12-04T09:17:18.2385254Z  * [new branch]              gh/c00w/53/base             -> origin/gh/c00w/53/base
2025-12-04T09:17:18.2387028Z  * [new branch]              gh/c00w/53/head             -> origin/gh/c00w/53/head
2025-12-04T09:17:18.2388820Z  * [new branch]              gh/c00w/53/orig             -> origin/gh/c00w/53/orig
2025-12-04T09:17:18.2391340Z  * [new branch]              gh/c00w/54/base             -> origin/gh/c00w/54/base
2025-12-04T09:17:18.2393220Z  * [new branch]              gh/c00w/54/head             -> origin/gh/c00w/54/head
2025-12-04T09:17:18.2395167Z  * [new branch]              gh/c00w/54/orig             -> origin/gh/c00w/54/orig
2025-12-04T09:17:18.2397585Z  * [new branch]              gh/c00w/56/base             -> origin/gh/c00w/56/base
2025-12-04T09:17:18.2399588Z  * [new branch]              gh/c00w/56/head             -> origin/gh/c00w/56/head
2025-12-04T09:17:18.2401412Z  * [new branch]              gh/c00w/56/orig             -> origin/gh/c00w/56/orig
2025-12-04T09:17:18.2403750Z  * [new branch]              gh/c00w/57/base             -> origin/gh/c00w/57/base
2025-12-04T09:17:18.2405593Z  * [new branch]              gh/c00w/57/head             -> origin/gh/c00w/57/head
2025-12-04T09:17:18.2407464Z  * [new branch]              gh/c00w/57/orig             -> origin/gh/c00w/57/orig
2025-12-04T09:17:18.2410256Z  * [new branch]              gh/c00w/58/base             -> origin/gh/c00w/58/base
2025-12-04T09:17:18.2411890Z  * [new branch]              gh/c00w/58/head             -> origin/gh/c00w/58/head
2025-12-04T09:17:18.2413690Z  * [new branch]              gh/c00w/58/orig             -> origin/gh/c00w/58/orig
2025-12-04T09:17:18.2416936Z  * [new branch]              gh/clee2000/1/base          -> origin/gh/clee2000/1/base
2025-12-04T09:17:18.2418825Z  * [new branch]              gh/clee2000/1/head          -> origin/gh/clee2000/1/head
2025-12-04T09:17:18.2420865Z  * [new branch]              gh/clee2000/1/orig          -> origin/gh/clee2000/1/orig
2025-12-04T09:17:18.2424040Z  * [new branch]              gh/coconutruben/1/base      -> origin/gh/coconutruben/1/base
2025-12-04T09:17:18.2426022Z  * [new branch]              gh/coconutruben/1/head      -> origin/gh/coconutruben/1/head
2025-12-04T09:17:18.2428855Z  * [new branch]              gh/coconutruben/55/base     -> origin/gh/coconutruben/55/base
2025-12-04T09:17:18.2430622Z  * [new branch]              gh/coconutruben/55/head     -> origin/gh/coconutruben/55/head
2025-12-04T09:17:18.2432497Z  * [new branch]              gh/coconutruben/55/orig     -> origin/gh/coconutruben/55/orig
2025-12-04T09:17:18.2435136Z  * [new branch]              gh/coconutruben/57/base     -> origin/gh/coconutruben/57/base
2025-12-04T09:17:18.2437121Z  * [new branch]              gh/coconutruben/57/head     -> origin/gh/coconutruben/57/head
2025-12-04T09:17:18.2439107Z  * [new branch]              gh/coconutruben/57/orig     -> origin/gh/coconutruben/57/orig
2025-12-04T09:17:18.2441719Z  * [new branch]              gh/coconutruben/70/base     -> origin/gh/coconutruben/70/base
2025-12-04T09:17:18.2443566Z  * [new branch]              gh/coconutruben/70/head     -> origin/gh/coconutruben/70/head
2025-12-04T09:17:18.2445507Z  * [new branch]              gh/coconutruben/70/orig     -> origin/gh/coconutruben/70/orig
2025-12-04T09:17:18.2447858Z  * [new branch]              gh/coconutruben/71/base     -> origin/gh/coconutruben/71/base
2025-12-04T09:17:18.2449744Z  * [new branch]              gh/coconutruben/71/head     -> origin/gh/coconutruben/71/head
2025-12-04T09:17:18.2451611Z  * [new branch]              gh/coconutruben/71/orig     -> origin/gh/coconutruben/71/orig
2025-12-04T09:17:18.2454558Z  * [new branch]              gh/coconutruben/72/base     -> origin/gh/coconutruben/72/base
2025-12-04T09:17:18.2456212Z  * [new branch]              gh/coconutruben/72/head     -> origin/gh/coconutruben/72/head
2025-12-04T09:17:18.2458152Z  * [new branch]              gh/coconutruben/72/orig     -> origin/gh/coconutruben/72/orig
2025-12-04T09:17:18.2460830Z  * [new branch]              gh/coconutruben/73/base     -> origin/gh/coconutruben/73/base
2025-12-04T09:17:18.2462564Z  * [new branch]              gh/coconutruben/73/head     -> origin/gh/coconutruben/73/head
2025-12-04T09:17:18.2464483Z  * [new branch]              gh/coconutruben/73/orig     -> origin/gh/coconutruben/73/orig
2025-12-04T09:17:18.2467132Z  * [new branch]              gh/coconutruben/74/base     -> origin/gh/coconutruben/74/base
2025-12-04T09:17:18.2469117Z  * [new branch]              gh/coconutruben/74/head     -> origin/gh/coconutruben/74/head
2025-12-04T09:17:18.2470952Z  * [new branch]              gh/coconutruben/74/orig     -> origin/gh/coconutruben/74/orig
2025-12-04T09:17:18.2473590Z  * [new branch]              gh/coconutruben/79/base     -> origin/gh/coconutruben/79/base
2025-12-04T09:17:18.2475665Z  * [new branch]              gh/coconutruben/79/head     -> origin/gh/coconutruben/79/head
2025-12-04T09:17:18.2477371Z  * [new branch]              gh/coconutruben/79/orig     -> origin/gh/coconutruben/79/orig
2025-12-04T09:17:18.2480121Z  * [new branch]              gh/coconutruben/80/base     -> origin/gh/coconutruben/80/base
2025-12-04T09:17:18.2482746Z  * [new branch]              gh/coconutruben/80/head     -> origin/gh/coconutruben/80/head
2025-12-04T09:17:18.2484084Z  * [new branch]              gh/coconutruben/80/orig     -> origin/gh/coconutruben/80/orig
2025-12-04T09:17:18.2486715Z  * [new branch]              gh/coconutruben/82/base     -> origin/gh/coconutruben/82/base
2025-12-04T09:17:18.2488493Z  * [new branch]              gh/coconutruben/82/head     -> origin/gh/coconutruben/82/head
2025-12-04T09:17:18.2490411Z  * [new branch]              gh/coconutruben/82/orig     -> origin/gh/coconutruben/82/orig
2025-12-04T09:17:18.2493070Z  * [new branch]              gh/coconutruben/83/base     -> origin/gh/coconutruben/83/base
2025-12-04T09:17:18.2494827Z  * [new branch]              gh/coconutruben/83/head     -> origin/gh/coconutruben/83/head
2025-12-04T09:17:18.2496647Z  * [new branch]              gh/coconutruben/83/orig     -> origin/gh/coconutruben/83/orig
2025-12-04T09:17:18.2500208Z  * [new branch]              gh/coconutruben/84/base     -> origin/gh/coconutruben/84/base
2025-12-04T09:17:18.2501910Z  * [new branch]              gh/coconutruben/84/head     -> origin/gh/coconutruben/84/head
2025-12-04T09:17:18.2503701Z  * [new branch]              gh/coconutruben/84/orig     -> origin/gh/coconutruben/84/orig
2025-12-04T09:17:18.2506237Z  * [new branch]              gh/coconutruben/85/base     -> origin/gh/coconutruben/85/base
2025-12-04T09:17:18.2508229Z  * [new branch]              gh/coconutruben/85/head     -> origin/gh/coconutruben/85/head
2025-12-04T09:17:18.2510310Z  * [new branch]              gh/coconutruben/85/orig     -> origin/gh/coconutruben/85/orig
2025-12-04T09:17:18.2513012Z  * [new branch]              gh/coconutruben/86/base     -> origin/gh/coconutruben/86/base
2025-12-04T09:17:18.2515037Z  * [new branch]              gh/coconutruben/86/head     -> origin/gh/coconutruben/86/head
2025-12-04T09:17:18.2516653Z  * [new branch]              gh/coconutruben/86/orig     -> origin/gh/coconutruben/86/orig
2025-12-04T09:17:18.2519723Z  * [new branch]              gh/colinchan15/1/base       -> origin/gh/colinchan15/1/base
2025-12-04T09:17:18.2521578Z  * [new branch]              gh/colinchan15/1/head       -> origin/gh/colinchan15/1/head
2025-12-04T09:17:18.2523966Z  * [new branch]              gh/colinchan15/2/base       -> origin/gh/colinchan15/2/base
2025-12-04T09:17:18.2525796Z  * [new branch]              gh/colinchan15/2/head       -> origin/gh/colinchan15/2/head
2025-12-04T09:17:18.2528273Z  * [new branch]              gh/colinchan15/3/base       -> origin/gh/colinchan15/3/base
2025-12-04T09:17:18.2530029Z  * [new branch]              gh/colinchan15/3/head       -> origin/gh/colinchan15/3/head
2025-12-04T09:17:18.2532349Z  * [new branch]              gh/colinchan15/6/base       -> origin/gh/colinchan15/6/base
2025-12-04T09:17:18.2534149Z  * [new branch]              gh/colinchan15/6/head       -> origin/gh/colinchan15/6/head
2025-12-04T09:17:18.2537300Z  * [new branch]              gh/d4l3k/1/base             -> origin/gh/d4l3k/1/base
2025-12-04T09:17:18.2539184Z  * [new branch]              gh/d4l3k/1/head             -> origin/gh/d4l3k/1/head
2025-12-04T09:17:18.2541777Z  * [new branch]              gh/d4l3k/2/base             -> origin/gh/d4l3k/2/base
2025-12-04T09:17:18.2543608Z  * [new branch]              gh/d4l3k/2/head             -> origin/gh/d4l3k/2/head
2025-12-04T09:17:18.2545394Z  * [new branch]              gh/d4l3k/2/orig             -> origin/gh/d4l3k/2/orig
2025-12-04T09:17:18.2547887Z  * [new branch]              gh/d4l3k/3/base             -> origin/gh/d4l3k/3/base
2025-12-04T09:17:18.2549715Z  * [new branch]              gh/d4l3k/3/head             -> origin/gh/d4l3k/3/head
2025-12-04T09:17:18.2551690Z  * [new branch]              gh/d4l3k/3/orig             -> origin/gh/d4l3k/3/orig
2025-12-04T09:17:18.2554095Z  * [new branch]              gh/d4l3k/4/base             -> origin/gh/d4l3k/4/base
2025-12-04T09:17:18.2555914Z  * [new branch]              gh/d4l3k/4/head             -> origin/gh/d4l3k/4/head
2025-12-04T09:17:18.2557830Z  * [new branch]              gh/d4l3k/4/orig             -> origin/gh/d4l3k/4/orig
2025-12-04T09:17:18.2560347Z  * [new branch]              gh/d4l3k/5/base             -> origin/gh/d4l3k/5/base
2025-12-04T09:17:18.2562283Z  * [new branch]              gh/d4l3k/5/orig             -> origin/gh/d4l3k/5/orig
2025-12-04T09:17:18.2565397Z  * [new branch]              gh/davidberard98/392/base   -> origin/gh/davidberard98/392/base
2025-12-04T09:17:18.2567290Z  * [new branch]              gh/davidberard98/392/head   -> origin/gh/davidberard98/392/head
2025-12-04T09:17:18.2569133Z  * [new branch]              gh/davidberard98/392/orig   -> origin/gh/davidberard98/392/orig
2025-12-04T09:17:18.2571742Z  * [new branch]              gh/davidberard98/399/base   -> origin/gh/davidberard98/399/base
2025-12-04T09:17:18.2573647Z  * [new branch]              gh/davidberard98/399/head   -> origin/gh/davidberard98/399/head
2025-12-04T09:17:18.2575507Z  * [new branch]              gh/davidberard98/399/orig   -> origin/gh/davidberard98/399/orig
2025-12-04T09:17:18.2579044Z  * [new branch]              gh/desertfire/605/base      -> origin/gh/desertfire/605/base
2025-12-04T09:17:18.2581004Z  * [new branch]              gh/desertfire/605/head      -> origin/gh/desertfire/605/head
2025-12-04T09:17:18.2582854Z  * [new branch]              gh/desertfire/605/orig      -> origin/gh/desertfire/605/orig
2025-12-04T09:17:18.2585430Z  * [new branch]              gh/desertfire/606/base      -> origin/gh/desertfire/606/base
2025-12-04T09:17:18.2587249Z  * [new branch]              gh/desertfire/606/head      -> origin/gh/desertfire/606/head
2025-12-04T09:17:18.2589206Z  * [new branch]              gh/desertfire/606/orig      -> origin/gh/desertfire/606/orig
2025-12-04T09:17:18.2591754Z  * [new branch]              gh/desertfire/607/base      -> origin/gh/desertfire/607/base
2025-12-04T09:17:18.2593604Z  * [new branch]              gh/desertfire/607/head      -> origin/gh/desertfire/607/head
2025-12-04T09:17:18.2595500Z  * [new branch]              gh/desertfire/607/orig      -> origin/gh/desertfire/607/orig
2025-12-04T09:17:18.2598098Z  * [new branch]              gh/desertfire/608/base      -> origin/gh/desertfire/608/base
2025-12-04T09:17:18.2599922Z  * [new branch]              gh/desertfire/608/head      -> origin/gh/desertfire/608/head
2025-12-04T09:17:18.2601786Z  * [new branch]              gh/desertfire/608/orig      -> origin/gh/desertfire/608/orig
2025-12-04T09:17:18.2604276Z  * [new branch]              gh/desertfire/609/base      -> origin/gh/desertfire/609/base
2025-12-04T09:17:18.2606094Z  * [new branch]              gh/desertfire/609/head      -> origin/gh/desertfire/609/head
2025-12-04T09:17:18.2608158Z  * [new branch]              gh/desertfire/609/orig      -> origin/gh/desertfire/609/orig
2025-12-04T09:17:18.2610985Z  * [new branch]              gh/desertfire/610/base      -> origin/gh/desertfire/610/base
2025-12-04T09:17:18.2612806Z  * [new branch]              gh/desertfire/610/head      -> origin/gh/desertfire/610/head
2025-12-04T09:17:18.2615025Z  * [new branch]              gh/desertfire/610/orig      -> origin/gh/desertfire/610/orig
2025-12-04T09:17:18.2617043Z  * [new branch]              gh/desertfire/611/base      -> origin/gh/desertfire/611/base
2025-12-04T09:17:18.2618908Z  * [new branch]              gh/desertfire/611/head      -> origin/gh/desertfire/611/head
2025-12-04T09:17:18.2620943Z  * [new branch]              gh/desertfire/611/orig      -> origin/gh/desertfire/611/orig
2025-12-04T09:17:18.2623547Z  * [new branch]              gh/desertfire/612/base      -> origin/gh/desertfire/612/base
2025-12-04T09:17:18.2625593Z  * [new branch]              gh/desertfire/612/head      -> origin/gh/desertfire/612/head
2025-12-04T09:17:18.2627350Z  * [new branch]              gh/desertfire/612/orig      -> origin/gh/desertfire/612/orig
2025-12-04T09:17:18.2629842Z  * [new branch]              gh/desertfire/613/base      -> origin/gh/desertfire/613/base
2025-12-04T09:17:18.2631737Z  * [new branch]              gh/desertfire/613/head      -> origin/gh/desertfire/613/head
2025-12-04T09:17:18.2633619Z  * [new branch]              gh/desertfire/613/orig      -> origin/gh/desertfire/613/orig
2025-12-04T09:17:18.2642578Z  * [new branch]              gh/desertfire/614/base      -> origin/gh/desertfire/614/base
2025-12-04T09:17:18.2642953Z  * [new branch]              gh/desertfire/614/head      -> origin/gh/desertfire/614/head
2025-12-04T09:17:18.2643202Z  * [new branch]              gh/desertfire/614/orig      -> origin/gh/desertfire/614/orig
2025-12-04T09:17:18.2643425Z  * [new branch]              gh/desertfire/615/base      -> origin/gh/desertfire/615/base
2025-12-04T09:17:18.2644865Z  * [new branch]              gh/desertfire/615/head      -> origin/gh/desertfire/615/head
2025-12-04T09:17:18.2646606Z  * [new branch]              gh/desertfire/615/orig      -> origin/gh/desertfire/615/orig
2025-12-04T09:17:18.2648950Z  * [new branch]              gh/desertfire/616/base      -> origin/gh/desertfire/616/base
2025-12-04T09:17:18.2650898Z  * [new branch]              gh/desertfire/616/head      -> origin/gh/desertfire/616/head
2025-12-04T09:17:18.2652649Z  * [new branch]              gh/desertfire/616/orig      -> origin/gh/desertfire/616/orig
2025-12-04T09:17:18.2654997Z  * [new branch]              gh/desertfire/617/base      -> origin/gh/desertfire/617/base
2025-12-04T09:17:18.2657013Z  * [new branch]              gh/desertfire/617/head      -> origin/gh/desertfire/617/head
2025-12-04T09:17:18.2658840Z  * [new branch]              gh/desertfire/617/orig      -> origin/gh/desertfire/617/orig
2025-12-04T09:17:18.2662133Z  * [new branch]              gh/dharakk/1/base           -> origin/gh/dharakk/1/base
2025-12-04T09:17:18.2663994Z  * [new branch]              gh/dharakk/1/head           -> origin/gh/dharakk/1/head
2025-12-04T09:17:18.2667072Z  * [new branch]              gh/drisspg/170/base         -> origin/gh/drisspg/170/base
2025-12-04T09:17:18.2669212Z  * [new branch]              gh/drisspg/170/head         -> origin/gh/drisspg/170/head
2025-12-04T09:17:18.2670735Z  * [new branch]              gh/drisspg/170/orig         -> origin/gh/drisspg/170/orig
2025-12-04T09:17:18.2673245Z  * [new branch]              gh/drisspg/182/base         -> origin/gh/drisspg/182/base
2025-12-04T09:17:18.2675082Z  * [new branch]              gh/drisspg/182/head         -> origin/gh/drisspg/182/head
2025-12-04T09:17:18.2677432Z  * [new branch]              gh/drisspg/183/base         -> origin/gh/drisspg/183/base
2025-12-04T09:17:18.2679155Z  * [new branch]              gh/drisspg/183/head         -> origin/gh/drisspg/183/head
2025-12-04T09:17:18.2681492Z  * [new branch]              gh/drisspg/184/base         -> origin/gh/drisspg/184/base
2025-12-04T09:17:18.2683455Z  * [new branch]              gh/drisspg/184/head         -> origin/gh/drisspg/184/head
2025-12-04T09:17:18.2685947Z  * [new branch]              gh/drisspg/185/base         -> origin/gh/drisspg/185/base
2025-12-04T09:17:18.2687810Z  * [new branch]              gh/drisspg/185/head         -> origin/gh/drisspg/185/head
2025-12-04T09:17:18.2690323Z  * [new branch]              gh/drisspg/194/base         -> origin/gh/drisspg/194/base
2025-12-04T09:17:18.2692169Z  * [new branch]              gh/drisspg/194/head         -> origin/gh/drisspg/194/head
2025-12-04T09:17:18.2693987Z  * [new branch]              gh/drisspg/194/orig         -> origin/gh/drisspg/194/orig
2025-12-04T09:17:18.2696483Z  * [new branch]              gh/drisspg/200/base         -> origin/gh/drisspg/200/base
2025-12-04T09:17:18.2698522Z  * [new branch]              gh/drisspg/200/head         -> origin/gh/drisspg/200/head
2025-12-04T09:17:18.2700344Z  * [new branch]              gh/drisspg/200/orig         -> origin/gh/drisspg/200/orig
2025-12-04T09:17:18.2702730Z  * [new branch]              gh/drisspg/218/base         -> origin/gh/drisspg/218/base
2025-12-04T09:17:18.2704564Z  * [new branch]              gh/drisspg/218/head         -> origin/gh/drisspg/218/head
2025-12-04T09:17:18.2706424Z  * [new branch]              gh/drisspg/218/orig         -> origin/gh/drisspg/218/orig
2025-12-04T09:17:18.2711569Z  * [new branch]              gh/drisspg/219/base         -> origin/gh/drisspg/219/base
2025-12-04T09:17:18.2713307Z  * [new branch]              gh/drisspg/219/head         -> origin/gh/drisspg/219/head
2025-12-04T09:17:18.2715216Z  * [new branch]              gh/drisspg/219/orig         -> origin/gh/drisspg/219/orig
2025-12-04T09:17:18.2717615Z  * [new branch]              gh/drisspg/220/base         -> origin/gh/drisspg/220/base
2025-12-04T09:17:18.2719459Z  * [new branch]              gh/drisspg/220/head         -> origin/gh/drisspg/220/head
2025-12-04T09:17:18.2721220Z  * [new branch]              gh/drisspg/220/orig         -> origin/gh/drisspg/220/orig
2025-12-04T09:17:18.2723841Z  * [new branch]              gh/drisspg/221/base         -> origin/gh/drisspg/221/base
2025-12-04T09:17:18.2725714Z  * [new branch]              gh/drisspg/221/head         -> origin/gh/drisspg/221/head
2025-12-04T09:17:18.2727581Z  * [new branch]              gh/drisspg/221/orig         -> origin/gh/drisspg/221/orig
2025-12-04T09:17:18.2730376Z  * [new branch]              gh/drisspg/222/base         -> origin/gh/drisspg/222/base
2025-12-04T09:17:18.2732184Z  * [new branch]              gh/drisspg/222/head         -> origin/gh/drisspg/222/head
2025-12-04T09:17:18.2733977Z  * [new branch]              gh/drisspg/222/orig         -> origin/gh/drisspg/222/orig
2025-12-04T09:17:18.2736580Z  * [new branch]              gh/drisspg/223/base         -> origin/gh/drisspg/223/base
2025-12-04T09:17:18.2738370Z  * [new branch]              gh/drisspg/223/head         -> origin/gh/drisspg/223/head
2025-12-04T09:17:18.2740396Z  * [new branch]              gh/drisspg/223/orig         -> origin/gh/drisspg/223/orig
2025-12-04T09:17:18.2742906Z  * [new branch]              gh/drisspg/224/base         -> origin/gh/drisspg/224/base
2025-12-04T09:17:18.2744716Z  * [new branch]              gh/drisspg/224/head         -> origin/gh/drisspg/224/head
2025-12-04T09:17:18.2746563Z  * [new branch]              gh/drisspg/224/orig         -> origin/gh/drisspg/224/orig
2025-12-04T09:17:18.2749020Z  * [new branch]              gh/drisspg/225/base         -> origin/gh/drisspg/225/base
2025-12-04T09:17:18.2750881Z  * [new branch]              gh/drisspg/225/head         -> origin/gh/drisspg/225/head
2025-12-04T09:17:18.2752697Z  * [new branch]              gh/drisspg/225/orig         -> origin/gh/drisspg/225/orig
2025-12-04T09:17:18.2755203Z  * [new branch]              gh/drisspg/226/base         -> origin/gh/drisspg/226/base
2025-12-04T09:17:18.2756952Z  * [new branch]              gh/drisspg/226/head         -> origin/gh/drisspg/226/head
2025-12-04T09:17:18.2758760Z  * [new branch]              gh/drisspg/226/orig         -> origin/gh/drisspg/226/orig
2025-12-04T09:17:18.2762026Z  * [new branch]              gh/drisspg/227/base         -> origin/gh/drisspg/227/base
2025-12-04T09:17:18.2763809Z  * [new branch]              gh/drisspg/227/head         -> origin/gh/drisspg/227/head
2025-12-04T09:17:18.2765648Z  * [new branch]              gh/drisspg/227/orig         -> origin/gh/drisspg/227/orig
2025-12-04T09:17:18.2768259Z  * [new branch]              gh/drisspg/228/base         -> origin/gh/drisspg/228/base
2025-12-04T09:17:18.2770060Z  * [new branch]              gh/drisspg/228/head         -> origin/gh/drisspg/228/head
2025-12-04T09:17:18.2771907Z  * [new branch]              gh/drisspg/228/orig         -> origin/gh/drisspg/228/orig
2025-12-04T09:17:18.2774345Z  * [new branch]              gh/drisspg/229/base         -> origin/gh/drisspg/229/base
2025-12-04T09:17:18.2776177Z  * [new branch]              gh/drisspg/229/head         -> origin/gh/drisspg/229/head
2025-12-04T09:17:18.2778152Z  * [new branch]              gh/drisspg/229/orig         -> origin/gh/drisspg/229/orig
2025-12-04T09:17:18.2780861Z  * [new branch]              gh/drisspg/230/base         -> origin/gh/drisspg/230/base
2025-12-04T09:17:18.2782709Z  * [new branch]              gh/drisspg/230/head         -> origin/gh/drisspg/230/head
2025-12-04T09:17:18.2785341Z  * [new branch]              gh/drisspg/230/orig         -> origin/gh/drisspg/230/orig
2025-12-04T09:17:18.2788561Z  * [new branch]              gh/dsjohns2/1/base          -> origin/gh/dsjohns2/1/base
2025-12-04T09:17:18.2790395Z  * [new branch]              gh/dsjohns2/1/head          -> origin/gh/dsjohns2/1/head
2025-12-04T09:17:18.2793577Z  * [new branch]              gh/dzmitry-huba/1/base      -> origin/gh/dzmitry-huba/1/base
2025-12-04T09:17:18.2795381Z  * [new branch]              gh/dzmitry-huba/1/head      -> origin/gh/dzmitry-huba/1/head
2025-12-04T09:17:18.2798081Z  * [new branch]              gh/dzmitry-huba/12/base     -> origin/gh/dzmitry-huba/12/base
2025-12-04T09:17:18.2800042Z  * [new branch]              gh/dzmitry-huba/12/head     -> origin/gh/dzmitry-huba/12/head
2025-12-04T09:17:18.2801940Z  * [new branch]              gh/dzmitry-huba/12/orig     -> origin/gh/dzmitry-huba/12/orig
2025-12-04T09:17:18.2804545Z  * [new branch]              gh/dzmitry-huba/13/base     -> origin/gh/dzmitry-huba/13/base
2025-12-04T09:17:18.2806463Z  * [new branch]              gh/dzmitry-huba/13/head     -> origin/gh/dzmitry-huba/13/head
2025-12-04T09:17:18.2808436Z  * [new branch]              gh/dzmitry-huba/13/orig     -> origin/gh/dzmitry-huba/13/orig
2025-12-04T09:17:18.2811078Z  * [new branch]              gh/dzmitry-huba/14/base     -> origin/gh/dzmitry-huba/14/base
2025-12-04T09:17:18.2812866Z  * [new branch]              gh/dzmitry-huba/14/head     -> origin/gh/dzmitry-huba/14/head
2025-12-04T09:17:18.2814650Z  * [new branch]              gh/dzmitry-huba/14/orig     -> origin/gh/dzmitry-huba/14/orig
2025-12-04T09:17:18.2817317Z  * [new branch]              gh/dzmitry-huba/15/base     -> origin/gh/dzmitry-huba/15/base
2025-12-04T09:17:18.2819278Z  * [new branch]              gh/dzmitry-huba/15/head     -> origin/gh/dzmitry-huba/15/head
2025-12-04T09:17:18.2821029Z  * [new branch]              gh/dzmitry-huba/15/orig     -> origin/gh/dzmitry-huba/15/orig
2025-12-04T09:17:18.2823694Z  * [new branch]              gh/dzmitry-huba/16/base     -> origin/gh/dzmitry-huba/16/base
2025-12-04T09:17:18.2825563Z  * [new branch]              gh/dzmitry-huba/16/head     -> origin/gh/dzmitry-huba/16/head
2025-12-04T09:17:18.2827495Z  * [new branch]              gh/dzmitry-huba/16/orig     -> origin/gh/dzmitry-huba/16/orig
2025-12-04T09:17:18.2830110Z  * [new branch]              gh/dzmitry-huba/17/base     -> origin/gh/dzmitry-huba/17/base
2025-12-04T09:17:18.2831964Z  * [new branch]              gh/dzmitry-huba/17/head     -> origin/gh/dzmitry-huba/17/head
2025-12-04T09:17:18.2833797Z  * [new branch]              gh/dzmitry-huba/17/orig     -> origin/gh/dzmitry-huba/17/orig
2025-12-04T09:17:18.2836216Z  * [new branch]              gh/dzmitry-huba/2/base      -> origin/gh/dzmitry-huba/2/base
2025-12-04T09:17:18.2838002Z  * [new branch]              gh/dzmitry-huba/2/head      -> origin/gh/dzmitry-huba/2/head
2025-12-04T09:17:18.2840410Z  * [new branch]              gh/dzmitry-huba/3/base      -> origin/gh/dzmitry-huba/3/base
2025-12-04T09:17:18.2842172Z  * [new branch]              gh/dzmitry-huba/3/head      -> origin/gh/dzmitry-huba/3/head
2025-12-04T09:17:18.2845372Z  * [new branch]              gh/eellison/808/base        -> origin/gh/eellison/808/base
2025-12-04T09:17:18.2847241Z  * [new branch]              gh/eellison/808/head        -> origin/gh/eellison/808/head
2025-12-04T09:17:18.2849111Z  * [new branch]              gh/eellison/808/orig        -> origin/gh/eellison/808/orig
2025-12-04T09:17:18.2851872Z  * [new branch]              gh/eellison/822/base        -> origin/gh/eellison/822/base
2025-12-04T09:17:18.2853903Z  * [new branch]              gh/eellison/822/head        -> origin/gh/eellison/822/head
2025-12-04T09:17:18.2855534Z  * [new branch]              gh/eellison/822/orig        -> origin/gh/eellison/822/orig
2025-12-04T09:17:18.2858126Z  * [new branch]              gh/eellison/823/base        -> origin/gh/eellison/823/base
2025-12-04T09:17:18.2860183Z  * [new branch]              gh/eellison/823/head        -> origin/gh/eellison/823/head
2025-12-04T09:17:18.2861961Z  * [new branch]              gh/eellison/823/orig        -> origin/gh/eellison/823/orig
2025-12-04T09:17:18.2864522Z  * [new branch]              gh/eellison/862/base        -> origin/gh/eellison/862/base
2025-12-04T09:17:18.2866361Z  * [new branch]              gh/eellison/862/head        -> origin/gh/eellison/862/head
2025-12-04T09:17:18.2868197Z  * [new branch]              gh/eellison/862/orig        -> origin/gh/eellison/862/orig
2025-12-04T09:17:18.2870715Z  * [new branch]              gh/eellison/863/base        -> origin/gh/eellison/863/base
2025-12-04T09:17:18.2872732Z  * [new branch]              gh/eellison/863/head        -> origin/gh/eellison/863/head
2025-12-04T09:17:18.2874742Z  * [new branch]              gh/eellison/863/orig        -> origin/gh/eellison/863/orig
2025-12-04T09:17:18.2877199Z  * [new branch]              gh/eellison/864/base        -> origin/gh/eellison/864/base
2025-12-04T09:17:18.2878927Z  * [new branch]              gh/eellison/864/head        -> origin/gh/eellison/864/head
2025-12-04T09:17:18.2881009Z  * [new branch]              gh/eellison/864/orig        -> origin/gh/eellison/864/orig
2025-12-04T09:17:18.2884073Z  * [new branch]              gh/eellison/865/base        -> origin/gh/eellison/865/base
2025-12-04T09:17:18.2886846Z  * [new branch]              gh/eellison/865/head        -> origin/gh/eellison/865/head
2025-12-04T09:17:18.2889375Z  * [new branch]              gh/eellison/865/orig        -> origin/gh/eellison/865/orig
2025-12-04T09:17:18.2893184Z  * [new branch]              gh/eellison/866/base        -> origin/gh/eellison/866/base
2025-12-04T09:17:18.2895536Z  * [new branch]              gh/eellison/866/head        -> origin/gh/eellison/866/head
2025-12-04T09:17:18.2897940Z  * [new branch]              gh/eellison/866/orig        -> origin/gh/eellison/866/orig
2025-12-04T09:17:18.2901802Z  * [new branch]              gh/eellison/867/base        -> origin/gh/eellison/867/base
2025-12-04T09:17:18.2903975Z  * [new branch]              gh/eellison/867/head        -> origin/gh/eellison/867/head
2025-12-04T09:17:18.2906443Z  * [new branch]              gh/eellison/867/orig        -> origin/gh/eellison/867/orig
2025-12-04T09:17:18.2910642Z  * [new branch]              gh/eellison/868/base        -> origin/gh/eellison/868/base
2025-12-04T09:17:18.2913379Z  * [new branch]              gh/eellison/868/head        -> origin/gh/eellison/868/head
2025-12-04T09:17:18.2915786Z  * [new branch]              gh/eellison/868/orig        -> origin/gh/eellison/868/orig
2025-12-04T09:17:18.2919260Z  * [new branch]              gh/eellison/869/base        -> origin/gh/eellison/869/base
2025-12-04T09:17:18.2921660Z  * [new branch]              gh/eellison/869/head        -> origin/gh/eellison/869/head
2025-12-04T09:17:18.2924056Z  * [new branch]              gh/eellison/869/orig        -> origin/gh/eellison/869/orig
2025-12-04T09:17:18.2927374Z  * [new branch]              gh/eellison/870/base        -> origin/gh/eellison/870/base
2025-12-04T09:17:18.2929824Z  * [new branch]              gh/eellison/870/head        -> origin/gh/eellison/870/head
2025-12-04T09:17:18.2932217Z  * [new branch]              gh/eellison/870/orig        -> origin/gh/eellison/870/orig
2025-12-04T09:17:18.2935961Z  * [new branch]              gh/eellison/871/base        -> origin/gh/eellison/871/base
2025-12-04T09:17:18.2937569Z  * [new branch]              gh/eellison/871/head        -> origin/gh/eellison/871/head
2025-12-04T09:17:18.2939549Z  * [new branch]              gh/eellison/871/orig        -> origin/gh/eellison/871/orig
2025-12-04T09:17:18.2942425Z  * [new branch]              gh/eellison/872/base        -> origin/gh/eellison/872/base
2025-12-04T09:17:18.2944170Z  * [new branch]              gh/eellison/872/head        -> origin/gh/eellison/872/head
2025-12-04T09:17:18.2946056Z  * [new branch]              gh/eellison/872/orig        -> origin/gh/eellison/872/orig
2025-12-04T09:17:18.2948934Z  * [new branch]              gh/eellison/873/base        -> origin/gh/eellison/873/base
2025-12-04T09:17:18.2950738Z  * [new branch]              gh/eellison/873/head        -> origin/gh/eellison/873/head
2025-12-04T09:17:18.2952601Z  * [new branch]              gh/eellison/873/orig        -> origin/gh/eellison/873/orig
2025-12-04T09:17:18.2955269Z  * [new branch]              gh/eellison/874/base        -> origin/gh/eellison/874/base
2025-12-04T09:17:18.2957227Z  * [new branch]              gh/eellison/874/head        -> origin/gh/eellison/874/head
2025-12-04T09:17:18.2959109Z  * [new branch]              gh/eellison/874/orig        -> origin/gh/eellison/874/orig
2025-12-04T09:17:18.2962313Z  * [new branch]              gh/eellison/875/base        -> origin/gh/eellison/875/base
2025-12-04T09:17:18.2964213Z  * [new branch]              gh/eellison/875/head        -> origin/gh/eellison/875/head
2025-12-04T09:17:18.2966067Z  * [new branch]              gh/eellison/875/orig        -> origin/gh/eellison/875/orig
2025-12-04T09:17:18.2968725Z  * [new branch]              gh/eellison/876/base        -> origin/gh/eellison/876/base
2025-12-04T09:17:18.2970619Z  * [new branch]              gh/eellison/876/head        -> origin/gh/eellison/876/head
2025-12-04T09:17:18.2972650Z  * [new branch]              gh/eellison/876/orig        -> origin/gh/eellison/876/orig
2025-12-04T09:17:18.2975510Z  * [new branch]              gh/eellison/877/base        -> origin/gh/eellison/877/base
2025-12-04T09:17:18.2977452Z  * [new branch]              gh/eellison/877/head        -> origin/gh/eellison/877/head
2025-12-04T09:17:18.2979268Z  * [new branch]              gh/eellison/877/orig        -> origin/gh/eellison/877/orig
2025-12-04T09:17:18.2982299Z  * [new branch]              gh/eellison/878/base        -> origin/gh/eellison/878/base
2025-12-04T09:17:18.2984076Z  * [new branch]              gh/eellison/878/head        -> origin/gh/eellison/878/head
2025-12-04T09:17:18.2985971Z  * [new branch]              gh/eellison/878/orig        -> origin/gh/eellison/878/orig
2025-12-04T09:17:18.2988649Z  * [new branch]              gh/eellison/879/base        -> origin/gh/eellison/879/base
2025-12-04T09:17:18.2990580Z  * [new branch]              gh/eellison/879/head        -> origin/gh/eellison/879/head
2025-12-04T09:17:18.2993050Z  * [new branch]              gh/eellison/879/orig        -> origin/gh/eellison/879/orig
2025-12-04T09:17:18.2995525Z  * [new branch]              gh/eellison/880/base        -> origin/gh/eellison/880/base
2025-12-04T09:17:18.2997420Z  * [new branch]              gh/eellison/880/head        -> origin/gh/eellison/880/head
2025-12-04T09:17:18.2999356Z  * [new branch]              gh/eellison/880/orig        -> origin/gh/eellison/880/orig
2025-12-04T09:17:18.3002600Z  * [new branch]              gh/eellison/881/base        -> origin/gh/eellison/881/base
2025-12-04T09:17:18.3004045Z  * [new branch]              gh/eellison/881/head        -> origin/gh/eellison/881/head
2025-12-04T09:17:18.3005985Z  * [new branch]              gh/eellison/881/orig        -> origin/gh/eellison/881/orig
2025-12-04T09:17:18.3008696Z  * [new branch]              gh/eellison/882/base        -> origin/gh/eellison/882/base
2025-12-04T09:17:18.3010732Z  * [new branch]              gh/eellison/882/head        -> origin/gh/eellison/882/head
2025-12-04T09:17:18.3012817Z  * [new branch]              gh/eellison/882/orig        -> origin/gh/eellison/882/orig
2025-12-04T09:17:18.3015360Z  * [new branch]              gh/eellison/883/base        -> origin/gh/eellison/883/base
2025-12-04T09:17:18.3017195Z  * [new branch]              gh/eellison/883/head        -> origin/gh/eellison/883/head
2025-12-04T09:17:18.3019195Z  * [new branch]              gh/eellison/883/orig        -> origin/gh/eellison/883/orig
2025-12-04T09:17:18.3021833Z  * [new branch]              gh/eellison/884/base        -> origin/gh/eellison/884/base
2025-12-04T09:17:18.3023714Z  * [new branch]              gh/eellison/884/head        -> origin/gh/eellison/884/head
2025-12-04T09:17:18.3025523Z  * [new branch]              gh/eellison/884/orig        -> origin/gh/eellison/884/orig
2025-12-04T09:17:18.3028706Z  * [new branch]              gh/etaf/147/base            -> origin/gh/etaf/147/base
2025-12-04T09:17:18.3030598Z  * [new branch]              gh/etaf/147/head            -> origin/gh/etaf/147/head
2025-12-04T09:17:18.3033344Z  * [new branch]              gh/etaf/154/base            -> origin/gh/etaf/154/base
2025-12-04T09:17:18.3035253Z  * [new branch]              gh/etaf/154/head            -> origin/gh/etaf/154/head
2025-12-04T09:17:18.3037054Z  * [new branch]              gh/etaf/154/orig            -> origin/gh/etaf/154/orig
2025-12-04T09:17:18.3039591Z  * [new branch]              gh/etaf/156/base            -> origin/gh/etaf/156/base
2025-12-04T09:17:18.3041494Z  * [new branch]              gh/etaf/156/head            -> origin/gh/etaf/156/head
2025-12-04T09:17:18.3043375Z  * [new branch]              gh/etaf/156/orig            -> origin/gh/etaf/156/orig
2025-12-04T09:17:18.3046241Z  * [new branch]              gh/etaf/157/base            -> origin/gh/etaf/157/base
2025-12-04T09:17:18.3048112Z  * [new branch]              gh/etaf/157/head            -> origin/gh/etaf/157/head
2025-12-04T09:17:18.3050003Z  * [new branch]              gh/etaf/157/orig            -> origin/gh/etaf/157/orig
2025-12-04T09:17:18.3052661Z  * [new branch]              gh/etaf/158/base            -> origin/gh/etaf/158/base
2025-12-04T09:17:18.3054561Z  * [new branch]              gh/etaf/158/head            -> origin/gh/etaf/158/head
2025-12-04T09:17:18.3056435Z  * [new branch]              gh/etaf/158/orig            -> origin/gh/etaf/158/orig
2025-12-04T09:17:18.3059284Z  * [new branch]              gh/etaf/159/base            -> origin/gh/etaf/159/base
2025-12-04T09:17:18.3061401Z  * [new branch]              gh/etaf/159/head            -> origin/gh/etaf/159/head
2025-12-04T09:17:18.3063342Z  * [new branch]              gh/etaf/159/orig            -> origin/gh/etaf/159/orig
2025-12-04T09:17:18.3066517Z  * [new branch]              gh/etaf/160/base            -> origin/gh/etaf/160/base
2025-12-04T09:17:18.3068410Z  * [new branch]              gh/etaf/160/head            -> origin/gh/etaf/160/head
2025-12-04T09:17:18.3070260Z  * [new branch]              gh/etaf/160/orig            -> origin/gh/etaf/160/orig
2025-12-04T09:17:18.3072905Z  * [new branch]              gh/etaf/161/base            -> origin/gh/etaf/161/base
2025-12-04T09:17:18.3074903Z  * [new branch]              gh/etaf/161/head            -> origin/gh/etaf/161/head
2025-12-04T09:17:18.3076793Z  * [new branch]              gh/etaf/161/orig            -> origin/gh/etaf/161/orig
2025-12-04T09:17:18.3079478Z  * [new branch]              gh/etaf/166/base            -> origin/gh/etaf/166/base
2025-12-04T09:17:18.3081603Z  * [new branch]              gh/etaf/166/head            -> origin/gh/etaf/166/head
2025-12-04T09:17:18.3083490Z  * [new branch]              gh/etaf/166/orig            -> origin/gh/etaf/166/orig
2025-12-04T09:17:18.3085962Z  * [new branch]              gh/etaf/167/base            -> origin/gh/etaf/167/base
2025-12-04T09:17:18.3087895Z  * [new branch]              gh/etaf/167/head            -> origin/gh/etaf/167/head
2025-12-04T09:17:18.3089761Z  * [new branch]              gh/etaf/167/orig            -> origin/gh/etaf/167/orig
2025-12-04T09:17:18.3092494Z  * [new branch]              gh/etaf/168/base            -> origin/gh/etaf/168/base
2025-12-04T09:17:18.3094389Z  * [new branch]              gh/etaf/168/head            -> origin/gh/etaf/168/head
2025-12-04T09:17:18.3096218Z  * [new branch]              gh/etaf/168/orig            -> origin/gh/etaf/168/orig
2025-12-04T09:17:18.3098952Z  * [new branch]              gh/etaf/172/base            -> origin/gh/etaf/172/base
2025-12-04T09:17:18.3100989Z  * [new branch]              gh/etaf/172/head            -> origin/gh/etaf/172/head
2025-12-04T09:17:18.3102843Z  * [new branch]              gh/etaf/172/orig            -> origin/gh/etaf/172/orig
2025-12-04T09:17:18.3105713Z  * [new branch]              gh/etaf/173/base            -> origin/gh/etaf/173/base
2025-12-04T09:17:18.3107909Z  * [new branch]              gh/etaf/173/head            -> origin/gh/etaf/173/head
2025-12-04T09:17:18.3112597Z  * [new branch]              gh/etaf/173/orig            -> origin/gh/etaf/173/orig
2025-12-04T09:17:18.3115227Z  * [new branch]              gh/etaf/174/base            -> origin/gh/etaf/174/base
2025-12-04T09:17:18.3117087Z  * [new branch]              gh/etaf/174/head            -> origin/gh/etaf/174/head
2025-12-04T09:17:18.3119684Z  * [new branch]              gh/etaf/175/base            -> origin/gh/etaf/175/base
2025-12-04T09:17:18.3121950Z  * [new branch]              gh/etaf/175/head            -> origin/gh/etaf/175/head
2025-12-04T09:17:18.3123369Z  * [new branch]              gh/etaf/175/orig            -> origin/gh/etaf/175/orig
2025-12-04T09:17:18.3125982Z  * [new branch]              gh/etaf/176/base            -> origin/gh/etaf/176/base
2025-12-04T09:17:18.3127940Z  * [new branch]              gh/etaf/176/head            -> origin/gh/etaf/176/head
2025-12-04T09:17:18.3129774Z  * [new branch]              gh/etaf/176/orig            -> origin/gh/etaf/176/orig
2025-12-04T09:17:18.3132863Z  * [new branch]              gh/etaf/177/base            -> origin/gh/etaf/177/base
2025-12-04T09:17:18.3134967Z  * [new branch]              gh/etaf/177/head            -> origin/gh/etaf/177/head
2025-12-04T09:17:18.3137281Z  * [new branch]              gh/etaf/177/orig            -> origin/gh/etaf/177/orig
2025-12-04T09:17:18.3140189Z  * [new branch]              gh/etaf/178/base            -> origin/gh/etaf/178/base
2025-12-04T09:17:18.3142303Z  * [new branch]              gh/etaf/178/head            -> origin/gh/etaf/178/head
2025-12-04T09:17:18.3144115Z  * [new branch]              gh/etaf/178/orig            -> origin/gh/etaf/178/orig
2025-12-04T09:17:18.3146789Z  * [new branch]              gh/etaf/179/base            -> origin/gh/etaf/179/base
2025-12-04T09:17:18.3148685Z  * [new branch]              gh/etaf/179/head            -> origin/gh/etaf/179/head
2025-12-04T09:17:18.3150554Z  * [new branch]              gh/etaf/179/orig            -> origin/gh/etaf/179/orig
2025-12-04T09:17:18.3153116Z  * [new branch]              gh/etaf/180/base            -> origin/gh/etaf/180/base
2025-12-04T09:17:18.3155269Z  * [new branch]              gh/etaf/180/head            -> origin/gh/etaf/180/head
2025-12-04T09:17:18.3157117Z  * [new branch]              gh/etaf/180/orig            -> origin/gh/etaf/180/orig
2025-12-04T09:17:18.3160614Z  * [new branch]              gh/exclamaforte/1/base      -> origin/gh/exclamaforte/1/base
2025-12-04T09:17:18.3162254Z  * [new branch]              gh/exclamaforte/1/head      -> origin/gh/exclamaforte/1/head
2025-12-04T09:17:18.3164651Z  * [new branch]              gh/exclamaforte/2/base      -> origin/gh/exclamaforte/2/base
2025-12-04T09:17:18.3166418Z  * [new branch]              gh/exclamaforte/2/head      -> origin/gh/exclamaforte/2/head
2025-12-04T09:17:18.3168926Z  * [new branch]              gh/exclamaforte/3/base      -> origin/gh/exclamaforte/3/base
2025-12-04T09:17:18.3170854Z  * [new branch]              gh/exclamaforte/3/head      -> origin/gh/exclamaforte/3/head
2025-12-04T09:17:18.3173434Z  * [new branch]              gh/exclamaforte/4/base      -> origin/gh/exclamaforte/4/base
2025-12-04T09:17:18.3175216Z  * [new branch]              gh/exclamaforte/4/head      -> origin/gh/exclamaforte/4/head
2025-12-04T09:17:18.3178635Z  * [new branch]              gh/ezyang/2374/base         -> origin/gh/ezyang/2374/base
2025-12-04T09:17:18.3180616Z  * [new branch]              gh/ezyang/2374/head         -> origin/gh/ezyang/2374/head
2025-12-04T09:17:18.3182663Z  * [new branch]              gh/ezyang/2374/orig         -> origin/gh/ezyang/2374/orig
2025-12-04T09:17:18.3185156Z  * [new branch]              gh/ezyang/2973/base         -> origin/gh/ezyang/2973/base
2025-12-04T09:17:18.3187108Z  * [new branch]              gh/ezyang/2973/head         -> origin/gh/ezyang/2973/head
2025-12-04T09:17:18.3188900Z  * [new branch]              gh/ezyang/2973/orig         -> origin/gh/ezyang/2973/orig
2025-12-04T09:17:18.3191448Z  * [new branch]              gh/ezyang/2974/base         -> origin/gh/ezyang/2974/base
2025-12-04T09:17:18.3193276Z  * [new branch]              gh/ezyang/2974/head         -> origin/gh/ezyang/2974/head
2025-12-04T09:17:18.3195137Z  * [new branch]              gh/ezyang/2974/orig         -> origin/gh/ezyang/2974/orig
2025-12-04T09:17:18.3197628Z  * [new branch]              gh/ezyang/3131/base         -> origin/gh/ezyang/3131/base
2025-12-04T09:17:18.3199608Z  * [new branch]              gh/ezyang/3131/head         -> origin/gh/ezyang/3131/head
2025-12-04T09:17:18.3201456Z  * [new branch]              gh/ezyang/3131/orig         -> origin/gh/ezyang/3131/orig
2025-12-04T09:17:18.3204005Z  * [new branch]              gh/ezyang/3139/base         -> origin/gh/ezyang/3139/base
2025-12-04T09:17:18.3205838Z  * [new branch]              gh/ezyang/3139/head         -> origin/gh/ezyang/3139/head
2025-12-04T09:17:18.3208019Z  * [new branch]              gh/ezyang/3139/orig         -> origin/gh/ezyang/3139/orig
2025-12-04T09:17:18.3210819Z  * [new branch]              gh/ezyang/3140/base         -> origin/gh/ezyang/3140/base
2025-12-04T09:17:18.3212587Z  * [new branch]              gh/ezyang/3140/head         -> origin/gh/ezyang/3140/head
2025-12-04T09:17:18.3214458Z  * [new branch]              gh/ezyang/3140/orig         -> origin/gh/ezyang/3140/orig
2025-12-04T09:17:18.3217049Z  * [new branch]              gh/ezyang/3143/base         -> origin/gh/ezyang/3143/base
2025-12-04T09:17:18.3218878Z  * [new branch]              gh/ezyang/3143/head         -> origin/gh/ezyang/3143/head
2025-12-04T09:17:18.3221017Z  * [new branch]              gh/ezyang/3143/orig         -> origin/gh/ezyang/3143/orig
2025-12-04T09:17:18.3223684Z  * [new branch]              gh/ezyang/3144/base         -> origin/gh/ezyang/3144/base
2025-12-04T09:17:18.3225564Z  * [new branch]              gh/ezyang/3144/head         -> origin/gh/ezyang/3144/head
2025-12-04T09:17:18.3227411Z  * [new branch]              gh/ezyang/3144/orig         -> origin/gh/ezyang/3144/orig
2025-12-04T09:17:18.3230383Z  * [new branch]              gh/ezyang/3167/base         -> origin/gh/ezyang/3167/base
2025-12-04T09:17:18.3231821Z  * [new branch]              gh/ezyang/3167/head         -> origin/gh/ezyang/3167/head
2025-12-04T09:17:18.3234576Z  * [new branch]              gh/ezyang/3167/orig         -> origin/gh/ezyang/3167/orig
2025-12-04T09:17:18.3237114Z  * [new branch]              gh/ezyang/3173/base         -> origin/gh/ezyang/3173/base
2025-12-04T09:17:18.3239012Z  * [new branch]              gh/ezyang/3173/head         -> origin/gh/ezyang/3173/head
2025-12-04T09:17:18.3240878Z  * [new branch]              gh/ezyang/3173/orig         -> origin/gh/ezyang/3173/orig
2025-12-04T09:17:18.3243500Z  * [new branch]              gh/ezyang/3175/base         -> origin/gh/ezyang/3175/base
2025-12-04T09:17:18.3246094Z  * [new branch]              gh/ezyang/3175/head         -> origin/gh/ezyang/3175/head
2025-12-04T09:17:18.3247911Z  * [new branch]              gh/ezyang/3175/orig         -> origin/gh/ezyang/3175/orig
2025-12-04T09:17:18.3250501Z  * [new branch]              gh/ezyang/3182/base         -> origin/gh/ezyang/3182/base
2025-12-04T09:17:18.3252386Z  * [new branch]              gh/ezyang/3182/head         -> origin/gh/ezyang/3182/head
2025-12-04T09:17:18.3254324Z  * [new branch]              gh/ezyang/3182/orig         -> origin/gh/ezyang/3182/orig
2025-12-04T09:17:18.3256892Z  * [new branch]              gh/ezyang/3185/base         -> origin/gh/ezyang/3185/base
2025-12-04T09:17:18.3258909Z  * [new branch]              gh/ezyang/3185/head         -> origin/gh/ezyang/3185/head
2025-12-04T09:17:18.3260788Z  * [new branch]              gh/ezyang/3185/orig         -> origin/gh/ezyang/3185/orig
2025-12-04T09:17:18.3263421Z  * [new branch]              gh/ezyang/3189/base         -> origin/gh/ezyang/3189/base
2025-12-04T09:17:18.3265249Z  * [new branch]              gh/ezyang/3189/head         -> origin/gh/ezyang/3189/head
2025-12-04T09:17:18.3267169Z  * [new branch]              gh/ezyang/3189/orig         -> origin/gh/ezyang/3189/orig
2025-12-04T09:17:18.3269753Z  * [new branch]              gh/ezyang/3191/base         -> origin/gh/ezyang/3191/base
2025-12-04T09:17:18.3271624Z  * [new branch]              gh/ezyang/3191/head         -> origin/gh/ezyang/3191/head
2025-12-04T09:17:18.3273522Z  * [new branch]              gh/ezyang/3191/orig         -> origin/gh/ezyang/3191/orig
2025-12-04T09:17:18.3276792Z  * [new branch]              gh/ezyang/3192/base         -> origin/gh/ezyang/3192/base
2025-12-04T09:17:18.3278676Z  * [new branch]              gh/ezyang/3192/head         -> origin/gh/ezyang/3192/head
2025-12-04T09:17:18.3280602Z  * [new branch]              gh/ezyang/3192/orig         -> origin/gh/ezyang/3192/orig
2025-12-04T09:17:18.3283232Z  * [new branch]              gh/ezyang/3193/base         -> origin/gh/ezyang/3193/base
2025-12-04T09:17:18.3285156Z  * [new branch]              gh/ezyang/3193/head         -> origin/gh/ezyang/3193/head
2025-12-04T09:17:18.3287021Z  * [new branch]              gh/ezyang/3193/orig         -> origin/gh/ezyang/3193/orig
2025-12-04T09:17:18.3289848Z  * [new branch]              gh/ezyang/3194/base         -> origin/gh/ezyang/3194/base
2025-12-04T09:17:18.3291734Z  * [new branch]              gh/ezyang/3194/head         -> origin/gh/ezyang/3194/head
2025-12-04T09:17:18.3293615Z  * [new branch]              gh/ezyang/3194/orig         -> origin/gh/ezyang/3194/orig
2025-12-04T09:17:18.3296215Z  * [new branch]              gh/ezyang/3195/base         -> origin/gh/ezyang/3195/base
2025-12-04T09:17:18.3298125Z  * [new branch]              gh/ezyang/3195/head         -> origin/gh/ezyang/3195/head
2025-12-04T09:17:18.3300044Z  * [new branch]              gh/ezyang/3195/orig         -> origin/gh/ezyang/3195/orig
2025-12-04T09:17:18.3302746Z  * [new branch]              gh/ezyang/3196/base         -> origin/gh/ezyang/3196/base
2025-12-04T09:17:18.3304642Z  * [new branch]              gh/ezyang/3196/head         -> origin/gh/ezyang/3196/head
2025-12-04T09:17:18.3306610Z  * [new branch]              gh/ezyang/3196/orig         -> origin/gh/ezyang/3196/orig
2025-12-04T09:17:18.3309778Z  * [new branch]              gh/ezyang/3197/base         -> origin/gh/ezyang/3197/base
2025-12-04T09:17:18.3311541Z  * [new branch]              gh/ezyang/3197/head         -> origin/gh/ezyang/3197/head
2025-12-04T09:17:18.3313429Z  * [new branch]              gh/ezyang/3197/orig         -> origin/gh/ezyang/3197/orig
2025-12-04T09:17:18.3316258Z  * [new branch]              gh/ezyang/3198/base         -> origin/gh/ezyang/3198/base
2025-12-04T09:17:18.3318171Z  * [new branch]              gh/ezyang/3198/head         -> origin/gh/ezyang/3198/head
2025-12-04T09:17:18.3320072Z  * [new branch]              gh/ezyang/3198/orig         -> origin/gh/ezyang/3198/orig
2025-12-04T09:17:18.3322740Z  * [new branch]              gh/ezyang/3199/base         -> origin/gh/ezyang/3199/base
2025-12-04T09:17:18.3324538Z  * [new branch]              gh/ezyang/3199/head         -> origin/gh/ezyang/3199/head
2025-12-04T09:17:18.3326504Z  * [new branch]              gh/ezyang/3199/orig         -> origin/gh/ezyang/3199/orig
2025-12-04T09:17:18.3329101Z  * [new branch]              gh/ezyang/3200/base         -> origin/gh/ezyang/3200/base
2025-12-04T09:17:18.3330958Z  * [new branch]              gh/ezyang/3200/head         -> origin/gh/ezyang/3200/head
2025-12-04T09:17:18.3332842Z  * [new branch]              gh/ezyang/3200/orig         -> origin/gh/ezyang/3200/orig
2025-12-04T09:17:18.3335477Z  * [new branch]              gh/ezyang/3201/base         -> origin/gh/ezyang/3201/base
2025-12-04T09:17:18.3337557Z  * [new branch]              gh/ezyang/3201/head         -> origin/gh/ezyang/3201/head
2025-12-04T09:17:18.3339287Z  * [new branch]              gh/ezyang/3201/orig         -> origin/gh/ezyang/3201/orig
2025-12-04T09:17:18.3342181Z  * [new branch]              gh/ezyang/3202/base         -> origin/gh/ezyang/3202/base
2025-12-04T09:17:18.3344024Z  * [new branch]              gh/ezyang/3202/head         -> origin/gh/ezyang/3202/head
2025-12-04T09:17:18.3345939Z  * [new branch]              gh/ezyang/3202/orig         -> origin/gh/ezyang/3202/orig
2025-12-04T09:17:18.3348605Z  * [new branch]              gh/ezyang/3203/base         -> origin/gh/ezyang/3203/base
2025-12-04T09:17:18.3350453Z  * [new branch]              gh/ezyang/3203/head         -> origin/gh/ezyang/3203/head
2025-12-04T09:17:18.3352513Z  * [new branch]              gh/ezyang/3203/orig         -> origin/gh/ezyang/3203/orig
2025-12-04T09:17:18.3355147Z  * [new branch]              gh/ezyang/3204/base         -> origin/gh/ezyang/3204/base
2025-12-04T09:17:18.3357074Z  * [new branch]              gh/ezyang/3204/head         -> origin/gh/ezyang/3204/head
2025-12-04T09:17:18.3358994Z  * [new branch]              gh/ezyang/3204/orig         -> origin/gh/ezyang/3204/orig
2025-12-04T09:17:18.3361607Z  * [new branch]              gh/ezyang/3205/base         -> origin/gh/ezyang/3205/base
2025-12-04T09:17:18.3363432Z  * [new branch]              gh/ezyang/3205/head         -> origin/gh/ezyang/3205/head
2025-12-04T09:17:18.3365287Z  * [new branch]              gh/ezyang/3205/orig         -> origin/gh/ezyang/3205/orig
2025-12-04T09:17:18.3368065Z  * [new branch]              gh/ezyang/3206/base         -> origin/gh/ezyang/3206/base
2025-12-04T09:17:18.3369959Z  * [new branch]              gh/ezyang/3206/head         -> origin/gh/ezyang/3206/head
2025-12-04T09:17:18.3371814Z  * [new branch]              gh/ezyang/3206/orig         -> origin/gh/ezyang/3206/orig
2025-12-04T09:17:18.3374503Z  * [new branch]              gh/ezyang/3207/base         -> origin/gh/ezyang/3207/base
2025-12-04T09:17:18.3376276Z  * [new branch]              gh/ezyang/3207/head         -> origin/gh/ezyang/3207/head
2025-12-04T09:17:18.3378170Z  * [new branch]              gh/ezyang/3207/orig         -> origin/gh/ezyang/3207/orig
2025-12-04T09:17:18.3381028Z  * [new branch]              gh/ezyang/3208/base         -> origin/gh/ezyang/3208/base
2025-12-04T09:17:18.3382855Z  * [new branch]              gh/ezyang/3208/head         -> origin/gh/ezyang/3208/head
2025-12-04T09:17:18.3384791Z  * [new branch]              gh/ezyang/3208/orig         -> origin/gh/ezyang/3208/orig
2025-12-04T09:17:18.3387459Z  * [new branch]              gh/ezyang/3209/base         -> origin/gh/ezyang/3209/base
2025-12-04T09:17:18.3389411Z  * [new branch]              gh/ezyang/3209/head         -> origin/gh/ezyang/3209/head
2025-12-04T09:17:18.3391258Z  * [new branch]              gh/ezyang/3209/orig         -> origin/gh/ezyang/3209/orig
2025-12-04T09:17:18.3394509Z  * [new branch]              gh/fadara01/3/base          -> origin/gh/fadara01/3/base
2025-12-04T09:17:18.3396334Z  * [new branch]              gh/fadara01/3/head          -> origin/gh/fadara01/3/head
2025-12-04T09:17:18.3398193Z  * [new branch]              gh/fadara01/3/orig          -> origin/gh/fadara01/3/orig
2025-12-04T09:17:18.3400785Z  * [new branch]              gh/fadara01/5/base          -> origin/gh/fadara01/5/base
2025-12-04T09:17:18.3402658Z  * [new branch]              gh/fadara01/5/head          -> origin/gh/fadara01/5/head
2025-12-04T09:17:18.3404561Z  * [new branch]              gh/fadara01/5/orig          -> origin/gh/fadara01/5/orig
2025-12-04T09:17:18.3407059Z  * [new branch]              gh/fadara01/6/base          -> origin/gh/fadara01/6/base
2025-12-04T09:17:18.3409349Z  * [new branch]              gh/fadara01/6/head          -> origin/gh/fadara01/6/head
2025-12-04T09:17:18.3411203Z  * [new branch]              gh/fadara01/6/orig          -> origin/gh/fadara01/6/orig
2025-12-04T09:17:18.3413888Z  * [new branch]              gh/fadara01/7/base          -> origin/gh/fadara01/7/base
2025-12-04T09:17:18.3415661Z  * [new branch]              gh/fadara01/7/head          -> origin/gh/fadara01/7/head
2025-12-04T09:17:18.3417626Z  * [new branch]              gh/fadara01/7/orig          -> origin/gh/fadara01/7/orig
2025-12-04T09:17:18.3420534Z  * [new branch]              gh/fadara01/8/base          -> origin/gh/fadara01/8/base
2025-12-04T09:17:18.3422328Z  * [new branch]              gh/fadara01/8/head          -> origin/gh/fadara01/8/head
2025-12-04T09:17:18.3424203Z  * [new branch]              gh/fadara01/8/orig          -> origin/gh/fadara01/8/orig
2025-12-04T09:17:18.3426732Z  * [new branch]              gh/fadara01/9/base          -> origin/gh/fadara01/9/base
2025-12-04T09:17:18.3428558Z  * [new branch]              gh/fadara01/9/head          -> origin/gh/fadara01/9/head
2025-12-04T09:17:18.3430429Z  * [new branch]              gh/fadara01/9/orig          -> origin/gh/fadara01/9/orig
2025-12-04T09:17:18.3433495Z  * [new branch]              gh/fduwjj/182/base          -> origin/gh/fduwjj/182/base
2025-12-04T09:17:18.3435310Z  * [new branch]              gh/fduwjj/182/head          -> origin/gh/fduwjj/182/head
2025-12-04T09:17:18.3437132Z  * [new branch]              gh/fduwjj/182/orig          -> origin/gh/fduwjj/182/orig
2025-12-04T09:17:18.3439683Z  * [new branch]              gh/fduwjj/211/base          -> origin/gh/fduwjj/211/base
2025-12-04T09:17:18.3441713Z  * [new branch]              gh/fduwjj/211/head          -> origin/gh/fduwjj/211/head
2025-12-04T09:17:18.3443556Z  * [new branch]              gh/fduwjj/211/orig          -> origin/gh/fduwjj/211/orig
2025-12-04T09:17:18.3446061Z  * [new branch]              gh/fduwjj/212/base          -> origin/gh/fduwjj/212/base
2025-12-04T09:17:18.3447903Z  * [new branch]              gh/fduwjj/212/head          -> origin/gh/fduwjj/212/head
2025-12-04T09:17:18.3449699Z  * [new branch]              gh/fduwjj/212/orig          -> origin/gh/fduwjj/212/orig
2025-12-04T09:17:18.3452175Z  * [new branch]              gh/fduwjj/213/base          -> origin/gh/fduwjj/213/base
2025-12-04T09:17:18.3454021Z  * [new branch]              gh/fduwjj/213/head          -> origin/gh/fduwjj/213/head
2025-12-04T09:17:18.3455890Z  * [new branch]              gh/fduwjj/213/orig          -> origin/gh/fduwjj/213/orig
2025-12-04T09:17:18.3458447Z  * [new branch]              gh/fduwjj/226/base          -> origin/gh/fduwjj/226/base
2025-12-04T09:17:18.3460419Z  * [new branch]              gh/fduwjj/226/head          -> origin/gh/fduwjj/226/head
2025-12-04T09:17:18.3462275Z  * [new branch]              gh/fduwjj/226/orig          -> origin/gh/fduwjj/226/orig
2025-12-04T09:17:18.3465050Z  * [new branch]              gh/fduwjj/229/base          -> origin/gh/fduwjj/229/base
2025-12-04T09:17:18.3466938Z  * [new branch]              gh/fduwjj/229/head          -> origin/gh/fduwjj/229/head
2025-12-04T09:17:18.3468837Z  * [new branch]              gh/fduwjj/229/orig          -> origin/gh/fduwjj/229/orig
2025-12-04T09:17:18.3472019Z  * [new branch]              gh/fduwjj/233/base          -> origin/gh/fduwjj/233/base
2025-12-04T09:17:18.3473836Z  * [new branch]              gh/fduwjj/233/head          -> origin/gh/fduwjj/233/head
2025-12-04T09:17:18.3475722Z  * [new branch]              gh/fduwjj/233/orig          -> origin/gh/fduwjj/233/orig
2025-12-04T09:17:18.3478287Z  * [new branch]              gh/fduwjj/234/base          -> origin/gh/fduwjj/234/base
2025-12-04T09:17:18.3480155Z  * [new branch]              gh/fduwjj/234/head          -> origin/gh/fduwjj/234/head
2025-12-04T09:17:18.3482033Z  * [new branch]              gh/fduwjj/234/orig          -> origin/gh/fduwjj/234/orig
2025-12-04T09:17:18.3484810Z  * [new branch]              gh/fduwjj/235/base          -> origin/gh/fduwjj/235/base
2025-12-04T09:17:18.3486809Z  * [new branch]              gh/fduwjj/235/head          -> origin/gh/fduwjj/235/head
2025-12-04T09:17:18.3488627Z  * [new branch]              gh/fduwjj/235/orig          -> origin/gh/fduwjj/235/orig
2025-12-04T09:17:18.3491230Z  * [new branch]              gh/fduwjj/236/base          -> origin/gh/fduwjj/236/base
2025-12-04T09:17:18.3493043Z  * [new branch]              gh/fduwjj/236/head          -> origin/gh/fduwjj/236/head
2025-12-04T09:17:18.3494955Z  * [new branch]              gh/fduwjj/236/orig          -> origin/gh/fduwjj/236/orig
2025-12-04T09:17:18.3497353Z  * [new branch]              gh/fduwjj/237/base          -> origin/gh/fduwjj/237/base
2025-12-04T09:17:18.3499226Z  * [new branch]              gh/fduwjj/237/head          -> origin/gh/fduwjj/237/head
2025-12-04T09:17:18.3501229Z  * [new branch]              gh/fduwjj/237/orig          -> origin/gh/fduwjj/237/orig
2025-12-04T09:17:18.3503697Z  * [new branch]              gh/fduwjj/238/base          -> origin/gh/fduwjj/238/base
2025-12-04T09:17:18.3505650Z  * [new branch]              gh/fduwjj/238/head          -> origin/gh/fduwjj/238/head
2025-12-04T09:17:18.3507512Z  * [new branch]              gh/fduwjj/238/orig          -> origin/gh/fduwjj/238/orig
2025-12-04T09:17:18.3513205Z  * [new branch]              gh/fduwjj/239/base          -> origin/gh/fduwjj/239/base
2025-12-04T09:17:18.3515118Z  * [new branch]              gh/fduwjj/239/head          -> origin/gh/fduwjj/239/head
2025-12-04T09:17:18.3516932Z  * [new branch]              gh/fduwjj/239/orig          -> origin/gh/fduwjj/239/orig
2025-12-04T09:17:18.3520631Z  * [new branch]              gh/fegin/332/base           -> origin/gh/fegin/332/base
2025-12-04T09:17:18.3522465Z  * [new branch]              gh/fegin/332/head           -> origin/gh/fegin/332/head
2025-12-04T09:17:18.3524323Z  * [new branch]              gh/fegin/332/orig           -> origin/gh/fegin/332/orig
2025-12-04T09:17:18.3526790Z  * [new branch]              gh/fegin/333/base           -> origin/gh/fegin/333/base
2025-12-04T09:17:18.3528648Z  * [new branch]              gh/fegin/333/head           -> origin/gh/fegin/333/head
2025-12-04T09:17:18.3530523Z  * [new branch]              gh/fegin/333/orig           -> origin/gh/fegin/333/orig
2025-12-04T09:17:18.3533048Z  * [new branch]              gh/fegin/334/base           -> origin/gh/fegin/334/base
2025-12-04T09:17:18.3534884Z  * [new branch]              gh/fegin/334/head           -> origin/gh/fegin/334/head
2025-12-04T09:17:18.3536825Z  * [new branch]              gh/fegin/334/orig           -> origin/gh/fegin/334/orig
2025-12-04T09:17:18.3539356Z  * [new branch]              gh/fegin/335/base           -> origin/gh/fegin/335/base
2025-12-04T09:17:18.3541358Z  * [new branch]              gh/fegin/335/head           -> origin/gh/fegin/335/head
2025-12-04T09:17:18.3543161Z  * [new branch]              gh/fegin/335/orig           -> origin/gh/fegin/335/orig
2025-12-04T09:17:18.3546263Z  * [new branch]              gh/fffrog/160/base          -> origin/gh/fffrog/160/base
2025-12-04T09:17:18.3548139Z  * [new branch]              gh/fffrog/160/head          -> origin/gh/fffrog/160/head
2025-12-04T09:17:18.3550634Z  * [new branch]              gh/fffrog/177/base          -> origin/gh/fffrog/177/base
2025-12-04T09:17:18.3552405Z  * [new branch]              gh/fffrog/177/head          -> origin/gh/fffrog/177/head
2025-12-04T09:17:18.3554353Z  * [new branch]              gh/fffrog/177/orig          -> origin/gh/fffrog/177/orig
2025-12-04T09:17:18.3557349Z  * [new branch]              gh/fffrog/178/base          -> origin/gh/fffrog/178/base
2025-12-04T09:17:18.3559128Z  * [new branch]              gh/fffrog/178/head          -> origin/gh/fffrog/178/head
2025-12-04T09:17:18.3561099Z  * [new branch]              gh/fffrog/178/orig          -> origin/gh/fffrog/178/orig
2025-12-04T09:17:18.3563615Z  * [new branch]              gh/fffrog/181/base          -> origin/gh/fffrog/181/base
2025-12-04T09:17:18.3565467Z  * [new branch]              gh/fffrog/181/head          -> origin/gh/fffrog/181/head
2025-12-04T09:17:18.3567357Z  * [new branch]              gh/fffrog/181/orig          -> origin/gh/fffrog/181/orig
2025-12-04T09:17:18.3570148Z  * [new branch]              gh/fffrog/183/base          -> origin/gh/fffrog/183/base
2025-12-04T09:17:18.3571846Z  * [new branch]              gh/fffrog/183/head          -> origin/gh/fffrog/183/head
2025-12-04T09:17:18.3573553Z  * [new branch]              gh/fffrog/183/orig          -> origin/gh/fffrog/183/orig
2025-12-04T09:17:18.3576723Z  * [new branch]              gh/fxdawnn/10/base          -> origin/gh/fxdawnn/10/base
2025-12-04T09:17:18.3578868Z  * [new branch]              gh/fxdawnn/10/head          -> origin/gh/fxdawnn/10/head
2025-12-04T09:17:18.3581141Z  * [new branch]              gh/fxdawnn/10/orig          -> origin/gh/fxdawnn/10/orig
2025-12-04T09:17:18.3584415Z  * [new branch]              gh/fxdawnn/11/base          -> origin/gh/fxdawnn/11/base
2025-12-04T09:17:18.3586783Z  * [new branch]              gh/fxdawnn/11/head          -> origin/gh/fxdawnn/11/head
2025-12-04T09:17:18.3589574Z  * [new branch]              gh/fxdawnn/11/orig          -> origin/gh/fxdawnn/11/orig
2025-12-04T09:17:18.3591872Z  * [new branch]              gh/fxdawnn/12/base          -> origin/gh/fxdawnn/12/base
2025-12-04T09:17:18.3593781Z  * [new branch]              gh/fxdawnn/12/head          -> origin/gh/fxdawnn/12/head
2025-12-04T09:17:18.3595610Z  * [new branch]              gh/fxdawnn/12/orig          -> origin/gh/fxdawnn/12/orig
2025-12-04T09:17:18.3598204Z  * [new branch]              gh/fxdawnn/13/base          -> origin/gh/fxdawnn/13/base
2025-12-04T09:17:18.3600108Z  * [new branch]              gh/fxdawnn/13/head          -> origin/gh/fxdawnn/13/head
2025-12-04T09:17:18.3601936Z  * [new branch]              gh/fxdawnn/13/orig          -> origin/gh/fxdawnn/13/orig
2025-12-04T09:17:18.3604563Z  * [new branch]              gh/fxdawnn/14/base          -> origin/gh/fxdawnn/14/base
2025-12-04T09:17:18.3606373Z  * [new branch]              gh/fxdawnn/14/head          -> origin/gh/fxdawnn/14/head
2025-12-04T09:17:18.3608470Z  * [new branch]              gh/fxdawnn/14/orig          -> origin/gh/fxdawnn/14/orig
2025-12-04T09:17:18.3610950Z  * [new branch]              gh/fxdawnn/15/base          -> origin/gh/fxdawnn/15/base
2025-12-04T09:17:18.3612796Z  * [new branch]              gh/fxdawnn/15/head          -> origin/gh/fxdawnn/15/head
2025-12-04T09:17:18.3614514Z  * [new branch]              gh/fxdawnn/15/orig          -> origin/gh/fxdawnn/15/orig
2025-12-04T09:17:18.3617022Z  * [new branch]              gh/fxdawnn/6/base           -> origin/gh/fxdawnn/6/base
2025-12-04T09:17:18.3618852Z  * [new branch]              gh/fxdawnn/6/head           -> origin/gh/fxdawnn/6/head
2025-12-04T09:17:18.3621003Z  * [new branch]              gh/fxdawnn/6/orig           -> origin/gh/fxdawnn/6/orig
2025-12-04T09:17:18.3623594Z  * [new branch]              gh/fxdawnn/7/base           -> origin/gh/fxdawnn/7/base
2025-12-04T09:17:18.3625535Z  * [new branch]              gh/fxdawnn/7/head           -> origin/gh/fxdawnn/7/head
2025-12-04T09:17:18.3627212Z  * [new branch]              gh/fxdawnn/7/orig           -> origin/gh/fxdawnn/7/orig
2025-12-04T09:17:18.3630185Z  * [new branch]              gh/fxdawnn/9/base           -> origin/gh/fxdawnn/9/base
2025-12-04T09:17:18.3631595Z  * [new branch]              gh/fxdawnn/9/head           -> origin/gh/fxdawnn/9/head
2025-12-04T09:17:18.3633438Z  * [new branch]              gh/fxdawnn/9/orig           -> origin/gh/fxdawnn/9/orig
2025-12-04T09:17:18.3636504Z  * [new branch]              gh/galv/1/base              -> origin/gh/galv/1/base
2025-12-04T09:17:18.3638396Z  * [new branch]              gh/galv/1/head              -> origin/gh/galv/1/head
2025-12-04T09:17:18.3640293Z  * [new branch]              gh/galv/1/orig              -> origin/gh/galv/1/orig
2025-12-04T09:17:18.3642790Z  * [new branch]              gh/galv/2/base              -> origin/gh/galv/2/base
2025-12-04T09:17:18.3644694Z  * [new branch]              gh/galv/2/head              -> origin/gh/galv/2/head
2025-12-04T09:17:18.3646716Z  * [new branch]              gh/galv/2/orig              -> origin/gh/galv/2/orig
2025-12-04T09:17:18.3649338Z  * [new branch]              gh/galv/3/base              -> origin/gh/galv/3/base
2025-12-04T09:17:18.3651035Z  * [new branch]              gh/galv/3/head              -> origin/gh/galv/3/head
2025-12-04T09:17:18.3652988Z  * [new branch]              gh/galv/3/orig              -> origin/gh/galv/3/orig
2025-12-04T09:17:18.3656038Z  * [new branch]              gh/guangyey/134/base        -> origin/gh/guangyey/134/base
2025-12-04T09:17:18.3657923Z  * [new branch]              gh/guangyey/134/head        -> origin/gh/guangyey/134/head
2025-12-04T09:17:18.3659836Z  * [new branch]              gh/guangyey/134/orig        -> origin/gh/guangyey/134/orig
2025-12-04T09:17:18.3662421Z  * [new branch]              gh/guangyey/163/base        -> origin/gh/guangyey/163/base
2025-12-04T09:17:18.3664252Z  * [new branch]              gh/guangyey/163/head        -> origin/gh/guangyey/163/head
2025-12-04T09:17:18.3666138Z  * [new branch]              gh/guangyey/163/orig        -> origin/gh/guangyey/163/orig
2025-12-04T09:17:18.3668772Z  * [new branch]              gh/guangyey/168/base        -> origin/gh/guangyey/168/base
2025-12-04T09:17:18.3670714Z  * [new branch]              gh/guangyey/168/head        -> origin/gh/guangyey/168/head
2025-12-04T09:17:18.3672546Z  * [new branch]              gh/guangyey/168/orig        -> origin/gh/guangyey/168/orig
2025-12-04T09:17:18.3675662Z  * [new branch]              gh/guangyey/169/base        -> origin/gh/guangyey/169/base
2025-12-04T09:17:18.3677556Z  * [new branch]              gh/guangyey/169/head        -> origin/gh/guangyey/169/head
2025-12-04T09:17:18.3679430Z  * [new branch]              gh/guangyey/169/orig        -> origin/gh/guangyey/169/orig
2025-12-04T09:17:18.3681994Z  * [new branch]              gh/guangyey/170/base        -> origin/gh/guangyey/170/base
2025-12-04T09:17:18.3683882Z  * [new branch]              gh/guangyey/170/head        -> origin/gh/guangyey/170/head
2025-12-04T09:17:18.3685692Z  * [new branch]              gh/guangyey/170/orig        -> origin/gh/guangyey/170/orig
2025-12-04T09:17:18.3688212Z  * [new branch]              gh/guangyey/171/base        -> origin/gh/guangyey/171/base
2025-12-04T09:17:18.3690114Z  * [new branch]              gh/guangyey/171/head        -> origin/gh/guangyey/171/head
2025-12-04T09:17:18.3691995Z  * [new branch]              gh/guangyey/171/orig        -> origin/gh/guangyey/171/orig
2025-12-04T09:17:18.3694524Z  * [new branch]              gh/guangyey/178/base        -> origin/gh/guangyey/178/base
2025-12-04T09:17:18.3696668Z  * [new branch]              gh/guangyey/178/head        -> origin/gh/guangyey/178/head
2025-12-04T09:17:18.3698535Z  * [new branch]              gh/guangyey/178/orig        -> origin/gh/guangyey/178/orig
2025-12-04T09:17:18.3701317Z  * [new branch]              gh/guangyey/182/base        -> origin/gh/guangyey/182/base
2025-12-04T09:17:18.3703195Z  * [new branch]              gh/guangyey/182/head        -> origin/gh/guangyey/182/head
2025-12-04T09:17:18.3705050Z  * [new branch]              gh/guangyey/182/orig        -> origin/gh/guangyey/182/orig
2025-12-04T09:17:18.3707550Z  * [new branch]              gh/guangyey/183/base        -> origin/gh/guangyey/183/base
2025-12-04T09:17:18.3709698Z  * [new branch]              gh/guangyey/183/head        -> origin/gh/guangyey/183/head
2025-12-04T09:17:18.3711610Z  * [new branch]              gh/guangyey/183/orig        -> origin/gh/guangyey/183/orig
2025-12-04T09:17:18.3714139Z  * [new branch]              gh/guangyey/185/base        -> origin/gh/guangyey/185/base
2025-12-04T09:17:18.3715954Z  * [new branch]              gh/guangyey/185/head        -> origin/gh/guangyey/185/head
2025-12-04T09:17:18.3717846Z  * [new branch]              gh/guangyey/185/orig        -> origin/gh/guangyey/185/orig
2025-12-04T09:17:18.3720500Z  * [new branch]              gh/guangyey/186/base        -> origin/gh/guangyey/186/base
2025-12-04T09:17:18.3722418Z  * [new branch]              gh/guangyey/186/head        -> origin/gh/guangyey/186/head
2025-12-04T09:17:18.3724347Z  * [new branch]              gh/guangyey/186/orig        -> origin/gh/guangyey/186/orig
2025-12-04T09:17:18.3726882Z  * [new branch]              gh/guangyey/187/base        -> origin/gh/guangyey/187/base
2025-12-04T09:17:18.3728711Z  * [new branch]              gh/guangyey/187/head        -> origin/gh/guangyey/187/head
2025-12-04T09:17:18.3730588Z  * [new branch]              gh/guangyey/187/orig        -> origin/gh/guangyey/187/orig
2025-12-04T09:17:18.3733267Z  * [new branch]              gh/guangyey/188/base        -> origin/gh/guangyey/188/base
2025-12-04T09:17:18.3736225Z  * [new branch]              gh/guangyey/188/head        -> origin/gh/guangyey/188/head
2025-12-04T09:17:18.3738199Z  * [new branch]              gh/guangyey/188/orig        -> origin/gh/guangyey/188/orig
2025-12-04T09:17:18.3740605Z  * [new branch]              gh/guangyey/190/base        -> origin/gh/guangyey/190/base
2025-12-04T09:17:18.3742703Z  * [new branch]              gh/guangyey/190/head        -> origin/gh/guangyey/190/head
2025-12-04T09:17:18.3745080Z  * [new branch]              gh/guangyey/190/orig        -> origin/gh/guangyey/190/orig
2025-12-04T09:17:18.3748039Z  * [new branch]              gh/guangyey/208/base        -> origin/gh/guangyey/208/base
2025-12-04T09:17:18.3748915Z  * [new branch]              gh/guangyey/208/head        -> origin/gh/guangyey/208/head
2025-12-04T09:17:18.3751096Z  * [new branch]              gh/guangyey/208/orig        -> origin/gh/guangyey/208/orig
2025-12-04T09:17:18.3753507Z  * [new branch]              gh/guangyey/228/base        -> origin/gh/guangyey/228/base
2025-12-04T09:17:18.3755386Z  * [new branch]              gh/guangyey/228/head        -> origin/gh/guangyey/228/head
2025-12-04T09:17:18.3757263Z  * [new branch]              gh/guangyey/228/orig        -> origin/gh/guangyey/228/orig
2025-12-04T09:17:18.3760484Z  * [new branch]              gh/guangyey/230/base        -> origin/gh/guangyey/230/base
2025-12-04T09:17:18.3762336Z  * [new branch]              gh/guangyey/230/head        -> origin/gh/guangyey/230/head
2025-12-04T09:17:18.3764125Z  * [new branch]              gh/guangyey/230/orig        -> origin/gh/guangyey/230/orig
2025-12-04T09:17:18.3766813Z  * [new branch]              gh/guangyey/231/base        -> origin/gh/guangyey/231/base
2025-12-04T09:17:18.3768640Z  * [new branch]              gh/guangyey/231/head        -> origin/gh/guangyey/231/head
2025-12-04T09:17:18.3770591Z  * [new branch]              gh/guangyey/231/orig        -> origin/gh/guangyey/231/orig
2025-12-04T09:17:18.3773200Z  * [new branch]              gh/guangyey/232/base        -> origin/gh/guangyey/232/base
2025-12-04T09:17:18.3775023Z  * [new branch]              gh/guangyey/232/head        -> origin/gh/guangyey/232/head
2025-12-04T09:17:18.3776813Z  * [new branch]              gh/guangyey/232/orig        -> origin/gh/guangyey/232/orig
2025-12-04T09:17:18.3779524Z  * [new branch]              gh/guangyey/233/base        -> origin/gh/guangyey/233/base
2025-12-04T09:17:18.3781393Z  * [new branch]              gh/guangyey/233/head        -> origin/gh/guangyey/233/head
2025-12-04T09:17:18.3783196Z  * [new branch]              gh/guangyey/233/orig        -> origin/gh/guangyey/233/orig
2025-12-04T09:17:18.3785747Z  * [new branch]              gh/guangyey/234/base        -> origin/gh/guangyey/234/base
2025-12-04T09:17:18.3787626Z  * [new branch]              gh/guangyey/234/head        -> origin/gh/guangyey/234/head
2025-12-04T09:17:18.3789443Z  * [new branch]              gh/guangyey/234/orig        -> origin/gh/guangyey/234/orig
2025-12-04T09:17:18.3792068Z  * [new branch]              gh/guangyey/235/base        -> origin/gh/guangyey/235/base
2025-12-04T09:17:18.3793861Z  * [new branch]              gh/guangyey/235/head        -> origin/gh/guangyey/235/head
2025-12-04T09:17:18.3795712Z  * [new branch]              gh/guangyey/235/orig        -> origin/gh/guangyey/235/orig
2025-12-04T09:17:18.3798277Z  * [new branch]              gh/guangyey/236/base        -> origin/gh/guangyey/236/base
2025-12-04T09:17:18.3800371Z  * [new branch]              gh/guangyey/236/head        -> origin/gh/guangyey/236/head
2025-12-04T09:17:18.3802146Z  * [new branch]              gh/guangyey/236/orig        -> origin/gh/guangyey/236/orig
2025-12-04T09:17:18.3804720Z  * [new branch]              gh/guangyey/237/base        -> origin/gh/guangyey/237/base
2025-12-04T09:17:18.3806966Z  * [new branch]              gh/guangyey/237/head        -> origin/gh/guangyey/237/head
2025-12-04T09:17:18.3808891Z  * [new branch]              gh/guangyey/237/orig        -> origin/gh/guangyey/237/orig
2025-12-04T09:17:18.3811621Z  * [new branch]              gh/guangyey/238/base        -> origin/gh/guangyey/238/base
2025-12-04T09:17:18.3813431Z  * [new branch]              gh/guangyey/238/head        -> origin/gh/guangyey/238/head
2025-12-04T09:17:18.3816009Z  * [new branch]              gh/guangyey/239/base        -> origin/gh/guangyey/239/base
2025-12-04T09:17:18.3817844Z  * [new branch]              gh/guangyey/239/head        -> origin/gh/guangyey/239/head
2025-12-04T09:17:18.3819748Z  * [new branch]              gh/guangyey/239/orig        -> origin/gh/guangyey/239/orig
2025-12-04T09:17:18.3822367Z  * [new branch]              gh/guangyey/240/base        -> origin/gh/guangyey/240/base
2025-12-04T09:17:18.3824147Z  * [new branch]              gh/guangyey/240/head        -> origin/gh/guangyey/240/head
2025-12-04T09:17:18.3826149Z  * [new branch]              gh/guangyey/240/orig        -> origin/gh/guangyey/240/orig
2025-12-04T09:17:18.3828710Z  * [new branch]              gh/guangyey/241/base        -> origin/gh/guangyey/241/base
2025-12-04T09:17:18.3830569Z  * [new branch]              gh/guangyey/241/head        -> origin/gh/guangyey/241/head
2025-12-04T09:17:18.3832526Z  * [new branch]              gh/guangyey/241/orig        -> origin/gh/guangyey/241/orig
2025-12-04T09:17:18.3835062Z  * [new branch]              gh/guangyey/242/base        -> origin/gh/guangyey/242/base
2025-12-04T09:17:18.3836924Z  * [new branch]              gh/guangyey/242/head        -> origin/gh/guangyey/242/head
2025-12-04T09:17:18.3838751Z  * [new branch]              gh/guangyey/242/orig        -> origin/gh/guangyey/242/orig
2025-12-04T09:17:18.3841409Z  * [new branch]              gh/guangyey/243/base        -> origin/gh/guangyey/243/base
2025-12-04T09:17:18.3843235Z  * [new branch]              gh/guangyey/243/head        -> origin/gh/guangyey/243/head
2025-12-04T09:17:18.3845043Z  * [new branch]              gh/guangyey/243/orig        -> origin/gh/guangyey/243/orig
2025-12-04T09:17:18.3847719Z  * [new branch]              gh/guangyey/244/base        -> origin/gh/guangyey/244/base
2025-12-04T09:17:18.3849548Z  * [new branch]              gh/guangyey/244/head        -> origin/gh/guangyey/244/head
2025-12-04T09:17:18.3851474Z  * [new branch]              gh/guangyey/244/orig        -> origin/gh/guangyey/244/orig
2025-12-04T09:17:18.3854065Z  * [new branch]              gh/guangyey/245/base        -> origin/gh/guangyey/245/base
2025-12-04T09:17:18.3855898Z  * [new branch]              gh/guangyey/245/head        -> origin/gh/guangyey/245/head
2025-12-04T09:17:18.3857732Z  * [new branch]              gh/guangyey/245/orig        -> origin/gh/guangyey/245/orig
2025-12-04T09:17:18.3860808Z  * [new branch]              gh/guangyey/246/base        -> origin/gh/guangyey/246/base
2025-12-04T09:17:18.3862093Z  * [new branch]              gh/guangyey/246/head        -> origin/gh/guangyey/246/head
2025-12-04T09:17:18.3864092Z  * [new branch]              gh/guangyey/246/orig        -> origin/gh/guangyey/246/orig
2025-12-04T09:17:18.3866694Z  * [new branch]              gh/guangyey/247/base        -> origin/gh/guangyey/247/base
2025-12-04T09:17:18.3868559Z  * [new branch]              gh/guangyey/247/head        -> origin/gh/guangyey/247/head
2025-12-04T09:17:18.3870370Z  * [new branch]              gh/guangyey/247/orig        -> origin/gh/guangyey/247/orig
2025-12-04T09:17:18.3873022Z  * [new branch]              gh/guangyey/248/base        -> origin/gh/guangyey/248/base
2025-12-04T09:17:18.3875020Z  * [new branch]              gh/guangyey/248/head        -> origin/gh/guangyey/248/head
2025-12-04T09:17:18.3876810Z  * [new branch]              gh/guangyey/248/orig        -> origin/gh/guangyey/248/orig
2025-12-04T09:17:18.3879337Z  * [new branch]              gh/guangyey/249/base        -> origin/gh/guangyey/249/base
2025-12-04T09:17:18.3881341Z  * [new branch]              gh/guangyey/249/head        -> origin/gh/guangyey/249/head
2025-12-04T09:17:18.3883175Z  * [new branch]              gh/guangyey/249/orig        -> origin/gh/guangyey/249/orig
2025-12-04T09:17:18.3886330Z  * [new branch]              gh/guangyey/250/base        -> origin/gh/guangyey/250/base
2025-12-04T09:17:18.3888209Z  * [new branch]              gh/guangyey/250/head        -> origin/gh/guangyey/250/head
2025-12-04T09:17:18.3890027Z  * [new branch]              gh/guangyey/250/orig        -> origin/gh/guangyey/250/orig
2025-12-04T09:17:18.3892517Z  * [new branch]              gh/guangyey/251/base        -> origin/gh/guangyey/251/base
2025-12-04T09:17:18.3894356Z  * [new branch]              gh/guangyey/251/head        -> origin/gh/guangyey/251/head
2025-12-04T09:17:18.3896155Z  * [new branch]              gh/guangyey/251/orig        -> origin/gh/guangyey/251/orig
2025-12-04T09:17:18.3898838Z  * [new branch]              gh/guangyey/252/base        -> origin/gh/guangyey/252/base
2025-12-04T09:17:18.3900765Z  * [new branch]              gh/guangyey/252/head        -> origin/gh/guangyey/252/head
2025-12-04T09:17:18.3902700Z  * [new branch]              gh/guangyey/252/orig        -> origin/gh/guangyey/252/orig
2025-12-04T09:17:18.3905249Z  * [new branch]              gh/guangyey/253/base        -> origin/gh/guangyey/253/base
2025-12-04T09:17:18.3907041Z  * [new branch]              gh/guangyey/253/head        -> origin/gh/guangyey/253/head
2025-12-04T09:17:18.3908878Z  * [new branch]              gh/guangyey/253/orig        -> origin/gh/guangyey/253/orig
2025-12-04T09:17:18.3913716Z  * [new branch]              gh/guangyey/254/base        -> origin/gh/guangyey/254/base
2025-12-04T09:17:18.3915582Z  * [new branch]              gh/guangyey/254/head        -> origin/gh/guangyey/254/head
2025-12-04T09:17:18.3917374Z  * [new branch]              gh/guangyey/254/orig        -> origin/gh/guangyey/254/orig
2025-12-04T09:17:18.3923786Z  * [new branch]              gh/guangyey/255/base        -> origin/gh/guangyey/255/base
2025-12-04T09:17:18.3924603Z  * [new branch]              gh/guangyey/255/head        -> origin/gh/guangyey/255/head
2025-12-04T09:17:18.3925143Z  * [new branch]              gh/guangyey/255/orig        -> origin/gh/guangyey/255/orig
2025-12-04T09:17:18.3926939Z  * [new branch]              gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base
2025-12-04T09:17:18.3929245Z  * [new branch]              gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head
2025-12-04T09:17:18.3930520Z  * [new branch]              gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig
2025-12-04T09:17:18.3933133Z  * [new branch]              gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base
2025-12-04T09:17:18.3934955Z  * [new branch]              gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head
2025-12-04T09:17:18.3937009Z  * [new branch]              gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig
2025-12-04T09:17:18.3940194Z  * [new branch]              gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base
2025-12-04T09:17:18.3944413Z  * [new branch]              gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head
2025-12-04T09:17:18.3947816Z  * [new branch]              gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig
2025-12-04T09:17:18.3950856Z  * [new branch]              gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base
2025-12-04T09:17:18.3953291Z  * [new branch]              gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head
2025-12-04T09:17:18.3955969Z  * [new branch]              gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig
2025-12-04T09:17:18.3958576Z  * [new branch]              gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base
2025-12-04T09:17:18.3960363Z  * [new branch]              gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head
2025-12-04T09:17:18.3962222Z  * [new branch]              gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig
2025-12-04T09:17:18.3964873Z  * [new branch]              gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base
2025-12-04T09:17:18.3966772Z  * [new branch]              gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head
2025-12-04T09:17:18.3969134Z  * [new branch]              gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig
2025-12-04T09:17:18.3971755Z  * [new branch]              gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base
2025-12-04T09:17:18.3973627Z  * [new branch]              gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head
2025-12-04T09:17:18.3975525Z  * [new branch]              gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig
2025-12-04T09:17:18.3978046Z  * [new branch]              gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base
2025-12-04T09:17:18.3980143Z  * [new branch]              gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head
2025-12-04T09:17:18.3981947Z  * [new branch]              gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig
2025-12-04T09:17:18.3984535Z  * [new branch]              gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base
2025-12-04T09:17:18.3986714Z  * [new branch]              gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head
2025-12-04T09:17:18.3988745Z  * [new branch]              gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig
2025-12-04T09:17:18.3991309Z  * [new branch]              gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base
2025-12-04T09:17:18.3993182Z  * [new branch]              gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head
2025-12-04T09:17:18.3995069Z  * [new branch]              gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig
2025-12-04T09:17:18.3997582Z  * [new branch]              gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base
2025-12-04T09:17:18.3999441Z  * [new branch]              gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head
2025-12-04T09:17:18.4001303Z  * [new branch]              gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig
2025-12-04T09:17:18.4003812Z  * [new branch]              gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base
2025-12-04T09:17:18.4005607Z  * [new branch]              gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head
2025-12-04T09:17:18.4007437Z  * [new branch]              gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig
2025-12-04T09:17:18.4010307Z  * [new branch]              gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base
2025-12-04T09:17:18.4012144Z  * [new branch]              gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head
2025-12-04T09:17:18.4013928Z  * [new branch]              gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig
2025-12-04T09:17:18.4016423Z  * [new branch]              gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base
2025-12-04T09:17:18.4018244Z  * [new branch]              gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head
2025-12-04T09:17:18.4020214Z  * [new branch]              gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig
2025-12-04T09:17:18.4023216Z  * [new branch]              gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base
2025-12-04T09:17:18.4025125Z  * [new branch]              gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head
2025-12-04T09:17:18.4026945Z  * [new branch]              gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig
2025-12-04T09:17:18.4029573Z  * [new branch]              gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base
2025-12-04T09:17:18.4031146Z  * [new branch]              gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head
2025-12-04T09:17:18.4033154Z  * [new branch]              gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig
2025-12-04T09:17:18.4036259Z  * [new branch]              gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base
2025-12-04T09:17:18.4038168Z  * [new branch]              gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head
2025-12-04T09:17:18.4040108Z  * [new branch]              gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig
2025-12-04T09:17:18.4042770Z  * [new branch]              gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base
2025-12-04T09:17:18.4044575Z  * [new branch]              gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head
2025-12-04T09:17:18.4046456Z  * [new branch]              gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig
2025-12-04T09:17:18.4048988Z  * [new branch]              gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base
2025-12-04T09:17:18.4050848Z  * [new branch]              gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head
2025-12-04T09:17:18.4052932Z  * [new branch]              gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig
2025-12-04T09:17:18.4055333Z  * [new branch]              gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base
2025-12-04T09:17:18.4057415Z  * [new branch]              gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head
2025-12-04T09:17:18.4058793Z  * [new branch]              gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig
2025-12-04T09:17:18.4061733Z  * [new branch]              gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base
2025-12-04T09:17:18.4063701Z  * [new branch]              gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head
2025-12-04T09:17:18.4065657Z  * [new branch]              gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig
2025-12-04T09:17:18.4068213Z  * [new branch]              gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base
2025-12-04T09:17:18.4069992Z  * [new branch]              gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head
2025-12-04T09:17:18.4071949Z  * [new branch]              gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig
2025-12-04T09:17:18.4074477Z  * [new branch]              gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base
2025-12-04T09:17:18.4076295Z  * [new branch]              gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head
2025-12-04T09:17:18.4078101Z  * [new branch]              gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig
2025-12-04T09:17:18.4080698Z  * [new branch]              gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base
2025-12-04T09:17:18.4084827Z  * [new branch]              gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head
2025-12-04T09:17:18.4085655Z  * [new branch]              gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig
2025-12-04T09:17:18.4087613Z  * [new branch]              gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base
2025-12-04T09:17:18.4089359Z  * [new branch]              gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head
2025-12-04T09:17:18.4091172Z  * [new branch]              gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig
2025-12-04T09:17:18.4093754Z  * [new branch]              gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base
2025-12-04T09:17:18.4095666Z  * [new branch]              gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head
2025-12-04T09:17:18.4098029Z  * [new branch]              gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig
2025-12-04T09:17:18.4101379Z  * [new branch]              gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base
2025-12-04T09:17:18.4102415Z  * [new branch]              gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head
2025-12-04T09:17:18.4104612Z  * [new branch]              gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig
2025-12-04T09:17:18.4107218Z  * [new branch]              gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base
2025-12-04T09:17:18.4109070Z  * [new branch]              gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head
2025-12-04T09:17:18.4111049Z  * [new branch]              gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig
2025-12-04T09:17:18.4113626Z  * [new branch]              gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base
2025-12-04T09:17:18.4115582Z  * [new branch]              gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head
2025-12-04T09:17:18.4117478Z  * [new branch]              gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig
2025-12-04T09:17:18.4120112Z  * [new branch]              gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base
2025-12-04T09:17:18.4122036Z  * [new branch]              gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head
2025-12-04T09:17:18.4123915Z  * [new branch]              gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig
2025-12-04T09:17:18.4126637Z  * [new branch]              gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base
2025-12-04T09:17:18.4128443Z  * [new branch]              gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head
2025-12-04T09:17:18.4130866Z  * [new branch]              gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig
2025-12-04T09:17:18.4133964Z  * [new branch]              gh/hameerabbasi/1/base      -> origin/gh/hameerabbasi/1/base
2025-12-04T09:17:18.4135829Z  * [new branch]              gh/hameerabbasi/1/head      -> origin/gh/hameerabbasi/1/head
2025-12-04T09:17:18.4138283Z  * [new branch]              gh/hameerabbasi/2/base      -> origin/gh/hameerabbasi/2/base
2025-12-04T09:17:18.4140393Z  * [new branch]              gh/hameerabbasi/2/head      -> origin/gh/hameerabbasi/2/head
2025-12-04T09:17:18.4142269Z  * [new branch]              gh/hameerabbasi/2/orig      -> origin/gh/hameerabbasi/2/orig
2025-12-04T09:17:18.4145229Z  * [new branch]              gh/hameerabbasi/3/base      -> origin/gh/hameerabbasi/3/base
2025-12-04T09:17:18.4147104Z  * [new branch]              gh/hameerabbasi/3/head      -> origin/gh/hameerabbasi/3/head
2025-12-04T09:17:18.4149158Z  * [new branch]              gh/hameerabbasi/3/orig      -> origin/gh/hameerabbasi/3/orig
2025-12-04T09:17:18.4151577Z  * [new branch]              gh/hameerabbasi/4/base      -> origin/gh/hameerabbasi/4/base
2025-12-04T09:17:18.4153461Z  * [new branch]              gh/hameerabbasi/4/head      -> origin/gh/hameerabbasi/4/head
2025-12-04T09:17:18.4155253Z  * [new branch]              gh/hameerabbasi/4/orig      -> origin/gh/hameerabbasi/4/orig
2025-12-04T09:17:18.4158410Z  * [new branch]              gh/huydhn/1/next            -> origin/gh/huydhn/1/next
2025-12-04T09:17:18.4160781Z  * [new branch]              gh/huydhn/2/next            -> origin/gh/huydhn/2/next
2025-12-04T09:17:18.4163432Z  * [new branch]              gh/huydhn/3/next            -> origin/gh/huydhn/3/next
2025-12-04T09:17:18.4165981Z  * [new branch]              gh/huydhn/4/next            -> origin/gh/huydhn/4/next
2025-12-04T09:17:18.4168482Z  * [new branch]              gh/huydhn/5/next            -> origin/gh/huydhn/5/next
2025-12-04T09:17:18.4170989Z  * [new branch]              gh/huydhn/6/next            -> origin/gh/huydhn/6/next
2025-12-04T09:17:18.4174057Z  * [new branch]              gh/int3/97/base             -> origin/gh/int3/97/base
2025-12-04T09:17:18.4175976Z  * [new branch]              gh/int3/97/head             -> origin/gh/int3/97/head
2025-12-04T09:17:18.4179397Z  * [new branch]              gh/isuruf/101/base          -> origin/gh/isuruf/101/base
2025-12-04T09:17:18.4181244Z  * [new branch]              gh/isuruf/101/head          -> origin/gh/isuruf/101/head
2025-12-04T09:17:18.4183831Z  * [new branch]              gh/isuruf/146/base          -> origin/gh/isuruf/146/base
2025-12-04T09:17:18.4185939Z  * [new branch]              gh/isuruf/146/head          -> origin/gh/isuruf/146/head
2025-12-04T09:17:18.4187542Z  * [new branch]              gh/isuruf/146/orig          -> origin/gh/isuruf/146/orig
2025-12-04T09:17:18.4190648Z  * [new branch]              gh/isuruf/158/base          -> origin/gh/isuruf/158/base
2025-12-04T09:17:18.4192510Z  * [new branch]              gh/isuruf/158/head          -> origin/gh/isuruf/158/head
2025-12-04T09:17:18.4195049Z  * [new branch]              gh/isuruf/159/base          -> origin/gh/isuruf/159/base
2025-12-04T09:17:18.4196808Z  * [new branch]              gh/isuruf/159/head          -> origin/gh/isuruf/159/head
2025-12-04T09:17:18.4199362Z  * [new branch]              gh/isuruf/160/base          -> origin/gh/isuruf/160/base
2025-12-04T09:17:18.4201210Z  * [new branch]              gh/isuruf/160/head          -> origin/gh/isuruf/160/head
2025-12-04T09:17:18.4203094Z  * [new branch]              gh/isuruf/160/orig          -> origin/gh/isuruf/160/orig
2025-12-04T09:17:18.4205692Z  * [new branch]              gh/isuruf/81/base           -> origin/gh/isuruf/81/base
2025-12-04T09:17:18.4207543Z  * [new branch]              gh/isuruf/81/head           -> origin/gh/isuruf/81/head
2025-12-04T09:17:18.4209732Z  * [new branch]              gh/isuruf/81/orig           -> origin/gh/isuruf/81/orig
2025-12-04T09:17:18.4212780Z  * [new branch]              gh/jamesjwu/176/base        -> origin/gh/jamesjwu/176/base
2025-12-04T09:17:18.4214626Z  * [new branch]              gh/jamesjwu/176/head        -> origin/gh/jamesjwu/176/head
2025-12-04T09:17:18.4216523Z  * [new branch]              gh/jamesjwu/176/orig        -> origin/gh/jamesjwu/176/orig
2025-12-04T09:17:18.4219157Z  * [new branch]              gh/jamesjwu/187/base        -> origin/gh/jamesjwu/187/base
2025-12-04T09:17:18.4221595Z  * [new branch]              gh/jamesjwu/187/head        -> origin/gh/jamesjwu/187/head
2025-12-04T09:17:18.4223417Z  * [new branch]              gh/jamesjwu/187/orig        -> origin/gh/jamesjwu/187/orig
2025-12-04T09:17:18.4225971Z  * [new branch]              gh/jamesjwu/196/base        -> origin/gh/jamesjwu/196/base
2025-12-04T09:17:18.4227837Z  * [new branch]              gh/jamesjwu/196/head        -> origin/gh/jamesjwu/196/head
2025-12-04T09:17:18.4229693Z  * [new branch]              gh/jamesjwu/196/orig        -> origin/gh/jamesjwu/196/orig
2025-12-04T09:17:18.4232238Z  * [new branch]              gh/jamesjwu/198/base        -> origin/gh/jamesjwu/198/base
2025-12-04T09:17:18.4234151Z  * [new branch]              gh/jamesjwu/198/head        -> origin/gh/jamesjwu/198/head
2025-12-04T09:17:18.4236066Z  * [new branch]              gh/jamesjwu/198/orig        -> origin/gh/jamesjwu/198/orig
2025-12-04T09:17:18.4238767Z  * [new branch]              gh/jamesjwu/207/base        -> origin/gh/jamesjwu/207/base
2025-12-04T09:17:18.4240798Z  * [new branch]              gh/jamesjwu/207/head        -> origin/gh/jamesjwu/207/head
2025-12-04T09:17:18.4242663Z  * [new branch]              gh/jamesjwu/207/orig        -> origin/gh/jamesjwu/207/orig
2025-12-04T09:17:18.4245290Z  * [new branch]              gh/jamesjwu/208/base        -> origin/gh/jamesjwu/208/base
2025-12-04T09:17:18.4247125Z  * [new branch]              gh/jamesjwu/208/head        -> origin/gh/jamesjwu/208/head
2025-12-04T09:17:18.4248998Z  * [new branch]              gh/jamesjwu/208/orig        -> origin/gh/jamesjwu/208/orig
2025-12-04T09:17:18.4251622Z  * [new branch]              gh/jamesjwu/52/base         -> origin/gh/jamesjwu/52/base
2025-12-04T09:17:18.4253394Z  * [new branch]              gh/jamesjwu/52/head         -> origin/gh/jamesjwu/52/head
2025-12-04T09:17:18.4255983Z  * [new branch]              gh/jamesjwu/53/base         -> origin/gh/jamesjwu/53/base
2025-12-04T09:17:18.4257523Z  * [new branch]              gh/jamesjwu/53/head         -> origin/gh/jamesjwu/53/head
2025-12-04T09:17:18.4260106Z  * [new branch]              gh/jamesjwu/54/base         -> origin/gh/jamesjwu/54/base
2025-12-04T09:17:18.4261931Z  * [new branch]              gh/jamesjwu/54/head         -> origin/gh/jamesjwu/54/head
2025-12-04T09:17:18.4264281Z  * [new branch]              gh/jamesjwu/55/base         -> origin/gh/jamesjwu/55/base
2025-12-04T09:17:18.4266136Z  * [new branch]              gh/jamesjwu/55/head         -> origin/gh/jamesjwu/55/head
2025-12-04T09:17:18.4268481Z  * [new branch]              gh/jamesjwu/56/base         -> origin/gh/jamesjwu/56/base
2025-12-04T09:17:18.4270264Z  * [new branch]              gh/jamesjwu/56/head         -> origin/gh/jamesjwu/56/head
2025-12-04T09:17:18.4272648Z  * [new branch]              gh/jamesjwu/57/base         -> origin/gh/jamesjwu/57/base
2025-12-04T09:17:18.4274451Z  * [new branch]              gh/jamesjwu/57/head         -> origin/gh/jamesjwu/57/head
2025-12-04T09:17:18.4276730Z  * [new branch]              gh/jamesjwu/58/base         -> origin/gh/jamesjwu/58/base
2025-12-04T09:17:18.4278554Z  * [new branch]              gh/jamesjwu/58/head         -> origin/gh/jamesjwu/58/head
2025-12-04T09:17:18.4281032Z  * [new branch]              gh/jamesjwu/59/base         -> origin/gh/jamesjwu/59/base
2025-12-04T09:17:18.4283027Z  * [new branch]              gh/jamesjwu/59/head         -> origin/gh/jamesjwu/59/head
2025-12-04T09:17:18.4285284Z  * [new branch]              gh/jamesjwu/60/base         -> origin/gh/jamesjwu/60/base
2025-12-04T09:17:18.4287100Z  * [new branch]              gh/jamesjwu/60/head         -> origin/gh/jamesjwu/60/head
2025-12-04T09:17:18.4289495Z  * [new branch]              gh/jamesjwu/61/base         -> origin/gh/jamesjwu/61/base
2025-12-04T09:17:18.4291254Z  * [new branch]              gh/jamesjwu/61/head         -> origin/gh/jamesjwu/61/head
2025-12-04T09:17:18.4294531Z  * [new branch]              gh/jamesjwu/62/base         -> origin/gh/jamesjwu/62/base
2025-12-04T09:17:18.4295688Z  * [new branch]              gh/jamesjwu/62/head         -> origin/gh/jamesjwu/62/head
2025-12-04T09:17:18.4298235Z  * [new branch]              gh/jamesjwu/63/base         -> origin/gh/jamesjwu/63/base
2025-12-04T09:17:18.4300790Z  * [new branch]              gh/jamesjwu/63/head         -> origin/gh/jamesjwu/63/head
2025-12-04T09:17:18.4303759Z  * [new branch]              gh/jamesjwu/64/base         -> origin/gh/jamesjwu/64/base
2025-12-04T09:17:18.4305721Z  * [new branch]              gh/jamesjwu/64/head         -> origin/gh/jamesjwu/64/head
2025-12-04T09:17:18.4308293Z  * [new branch]              gh/jamesjwu/65/base         -> origin/gh/jamesjwu/65/base
2025-12-04T09:17:18.4310173Z  * [new branch]              gh/jamesjwu/65/head         -> origin/gh/jamesjwu/65/head
2025-12-04T09:17:18.4313313Z  * [new branch]              gh/janeyx99/165/base        -> origin/gh/janeyx99/165/base
2025-12-04T09:17:18.4315298Z  * [new branch]              gh/janeyx99/165/head        -> origin/gh/janeyx99/165/head
2025-12-04T09:17:18.4317138Z  * [new branch]              gh/janeyx99/165/orig        -> origin/gh/janeyx99/165/orig
2025-12-04T09:17:18.4319423Z  * [new branch]              gh/janeyx99/201/base        -> origin/gh/janeyx99/201/base
2025-12-04T09:17:18.4321244Z  * [new branch]              gh/janeyx99/201/head        -> origin/gh/janeyx99/201/head
2025-12-04T09:17:18.4323077Z  * [new branch]              gh/janeyx99/201/orig        -> origin/gh/janeyx99/201/orig
2025-12-04T09:17:18.4325808Z  * [new branch]              gh/janeyx99/225/base        -> origin/gh/janeyx99/225/base
2025-12-04T09:17:18.4327621Z  * [new branch]              gh/janeyx99/225/head        -> origin/gh/janeyx99/225/head
2025-12-04T09:17:18.4329546Z  * [new branch]              gh/janeyx99/225/orig        -> origin/gh/janeyx99/225/orig
2025-12-04T09:17:18.4332058Z  * [new branch]              gh/janeyx99/299/base        -> origin/gh/janeyx99/299/base
2025-12-04T09:17:18.4334016Z  * [new branch]              gh/janeyx99/299/head        -> origin/gh/janeyx99/299/head
2025-12-04T09:17:18.4335679Z  * [new branch]              gh/janeyx99/299/orig        -> origin/gh/janeyx99/299/orig
2025-12-04T09:17:18.4338549Z  * [new branch]              gh/janeyx99/302/base        -> origin/gh/janeyx99/302/base
2025-12-04T09:17:18.4340583Z  * [new branch]              gh/janeyx99/302/head        -> origin/gh/janeyx99/302/head
2025-12-04T09:17:18.4342891Z  * [new branch]              gh/janeyx99/303/base        -> origin/gh/janeyx99/303/base
2025-12-04T09:17:18.4344670Z  * [new branch]              gh/janeyx99/303/head        -> origin/gh/janeyx99/303/head
2025-12-04T09:17:18.4347220Z  * [new branch]              gh/janeyx99/305/base        -> origin/gh/janeyx99/305/base
2025-12-04T09:17:18.4349105Z  * [new branch]              gh/janeyx99/305/head        -> origin/gh/janeyx99/305/head
2025-12-04T09:17:18.4351424Z  * [new branch]              gh/janeyx99/306/base        -> origin/gh/janeyx99/306/base
2025-12-04T09:17:18.4353233Z  * [new branch]              gh/janeyx99/306/head        -> origin/gh/janeyx99/306/head
2025-12-04T09:17:18.4355712Z  * [new branch]              gh/janeyx99/314/base        -> origin/gh/janeyx99/314/base
2025-12-04T09:17:18.4357640Z  * [new branch]              gh/janeyx99/314/head        -> origin/gh/janeyx99/314/head
2025-12-04T09:17:18.4359453Z  * [new branch]              gh/janeyx99/314/orig        -> origin/gh/janeyx99/314/orig
2025-12-04T09:17:18.4361946Z  * [new branch]              gh/janeyx99/315/base        -> origin/gh/janeyx99/315/base
2025-12-04T09:17:18.4363765Z  * [new branch]              gh/janeyx99/315/head        -> origin/gh/janeyx99/315/head
2025-12-04T09:17:18.4365687Z  * [new branch]              gh/janeyx99/315/orig        -> origin/gh/janeyx99/315/orig
2025-12-04T09:17:18.4368177Z  * [new branch]              gh/janeyx99/316/base        -> origin/gh/janeyx99/316/base
2025-12-04T09:17:18.4370027Z  * [new branch]              gh/janeyx99/316/head        -> origin/gh/janeyx99/316/head
2025-12-04T09:17:18.4372440Z  * [new branch]              gh/janeyx99/316/orig        -> origin/gh/janeyx99/316/orig
2025-12-04T09:17:18.4375040Z  * [new branch]              gh/janeyx99/317/base        -> origin/gh/janeyx99/317/base
2025-12-04T09:17:18.4376838Z  * [new branch]              gh/janeyx99/317/head        -> origin/gh/janeyx99/317/head
2025-12-04T09:17:18.4378771Z  * [new branch]              gh/janeyx99/317/orig        -> origin/gh/janeyx99/317/orig
2025-12-04T09:17:18.4381614Z  * [new branch]              gh/janeyx99/325/base        -> origin/gh/janeyx99/325/base
2025-12-04T09:17:18.4383472Z  * [new branch]              gh/janeyx99/325/head        -> origin/gh/janeyx99/325/head
2025-12-04T09:17:18.4385339Z  * [new branch]              gh/janeyx99/325/orig        -> origin/gh/janeyx99/325/orig
2025-12-04T09:17:18.4387876Z  * [new branch]              gh/janeyx99/327/base        -> origin/gh/janeyx99/327/base
2025-12-04T09:17:18.4389734Z  * [new branch]              gh/janeyx99/327/head        -> origin/gh/janeyx99/327/head
2025-12-04T09:17:18.4391576Z  * [new branch]              gh/janeyx99/327/orig        -> origin/gh/janeyx99/327/orig
2025-12-04T09:17:18.4394069Z  * [new branch]              gh/janeyx99/328/base        -> origin/gh/janeyx99/328/base
2025-12-04T09:17:18.4395929Z  * [new branch]              gh/janeyx99/328/head        -> origin/gh/janeyx99/328/head
2025-12-04T09:17:18.4397757Z  * [new branch]              gh/janeyx99/328/orig        -> origin/gh/janeyx99/328/orig
2025-12-04T09:17:18.4400146Z  * [new branch]              gh/janeyx99/329/base        -> origin/gh/janeyx99/329/base
2025-12-04T09:17:18.4401992Z  * [new branch]              gh/janeyx99/329/head        -> origin/gh/janeyx99/329/head
2025-12-04T09:17:18.4403905Z  * [new branch]              gh/janeyx99/329/orig        -> origin/gh/janeyx99/329/orig
2025-12-04T09:17:18.4406994Z  * [new branch]              gh/janeyx99/330/base        -> origin/gh/janeyx99/330/base
2025-12-04T09:17:18.4409368Z  * [new branch]              gh/janeyx99/330/head        -> origin/gh/janeyx99/330/head
2025-12-04T09:17:18.4413171Z  * [new branch]              gh/janeyx99/330/orig        -> origin/gh/janeyx99/330/orig
2025-12-04T09:17:18.4415740Z  * [new branch]              gh/janeyx99/331/base        -> origin/gh/janeyx99/331/base
2025-12-04T09:17:18.4417568Z  * [new branch]              gh/janeyx99/331/head        -> origin/gh/janeyx99/331/head
2025-12-04T09:17:18.4419424Z  * [new branch]              gh/janeyx99/331/orig        -> origin/gh/janeyx99/331/orig
2025-12-04T09:17:18.4422334Z  * [new branch]              gh/janeyx99/332/base        -> origin/gh/janeyx99/332/base
2025-12-04T09:17:18.4423873Z  * [new branch]              gh/janeyx99/332/head        -> origin/gh/janeyx99/332/head
2025-12-04T09:17:18.4425697Z  * [new branch]              gh/janeyx99/332/orig        -> origin/gh/janeyx99/332/orig
2025-12-04T09:17:18.4428173Z  * [new branch]              gh/janeyx99/333/base        -> origin/gh/janeyx99/333/base
2025-12-04T09:17:18.4429995Z  * [new branch]              gh/janeyx99/333/head        -> origin/gh/janeyx99/333/head
2025-12-04T09:17:18.4431913Z  * [new branch]              gh/janeyx99/333/orig        -> origin/gh/janeyx99/333/orig
2025-12-04T09:17:18.4434581Z  * [new branch]              gh/janeyx99/88/base         -> origin/gh/janeyx99/88/base
2025-12-04T09:17:18.4436386Z  * [new branch]              gh/janeyx99/88/head         -> origin/gh/janeyx99/88/head
2025-12-04T09:17:18.4438205Z  * [new branch]              gh/janeyx99/88/orig         -> origin/gh/janeyx99/88/orig
2025-12-04T09:17:18.4441254Z  * [new branch]              gh/jansel/360/base          -> origin/gh/jansel/360/base
2025-12-04T09:17:18.4443098Z  * [new branch]              gh/jansel/360/head          -> origin/gh/jansel/360/head
2025-12-04T09:17:18.4445569Z  * [new branch]              gh/jansel/451/base          -> origin/gh/jansel/451/base
2025-12-04T09:17:18.4447444Z  * [new branch]              gh/jansel/451/head          -> origin/gh/jansel/451/head
2025-12-04T09:17:18.4449776Z  * [new branch]              gh/jansel/451/orig          -> origin/gh/jansel/451/orig
2025-12-04T09:17:18.4452270Z  * [new branch]              gh/jansel/462/base          -> origin/gh/jansel/462/base
2025-12-04T09:17:18.4454073Z  * [new branch]              gh/jansel/462/head          -> origin/gh/jansel/462/head
2025-12-04T09:17:18.4456052Z  * [new branch]              gh/jansel/462/orig          -> origin/gh/jansel/462/orig
2025-12-04T09:17:18.4459129Z  * [new branch]              gh/jansel/533/base          -> origin/gh/jansel/533/base
2025-12-04T09:17:18.4461055Z  * [new branch]              gh/jansel/533/head          -> origin/gh/jansel/533/head
2025-12-04T09:17:18.4462838Z  * [new branch]              gh/jansel/533/orig          -> origin/gh/jansel/533/orig
2025-12-04T09:17:18.4465340Z  * [new branch]              gh/jansel/552/base          -> origin/gh/jansel/552/base
2025-12-04T09:17:18.4467110Z  * [new branch]              gh/jansel/552/head          -> origin/gh/jansel/552/head
2025-12-04T09:17:18.4468933Z  * [new branch]              gh/jansel/552/orig          -> origin/gh/jansel/552/orig
2025-12-04T09:17:18.4471460Z  * [new branch]              gh/jansel/553/base          -> origin/gh/jansel/553/base
2025-12-04T09:17:18.4473268Z  * [new branch]              gh/jansel/553/head          -> origin/gh/jansel/553/head
2025-12-04T09:17:18.4475070Z  * [new branch]              gh/jansel/553/orig          -> origin/gh/jansel/553/orig
2025-12-04T09:17:18.4478038Z  * [new branch]              gh/jansel/554/base          -> origin/gh/jansel/554/base
2025-12-04T09:17:18.4479876Z  * [new branch]              gh/jansel/554/head          -> origin/gh/jansel/554/head
2025-12-04T09:17:18.4482030Z  * [new branch]              gh/jansel/554/orig          -> origin/gh/jansel/554/orig
2025-12-04T09:17:18.4484547Z  * [new branch]              gh/jansel/555/base          -> origin/gh/jansel/555/base
2025-12-04T09:17:18.4486505Z  * [new branch]              gh/jansel/555/head          -> origin/gh/jansel/555/head
2025-12-04T09:17:18.4488518Z  * [new branch]              gh/jansel/555/orig          -> origin/gh/jansel/555/orig
2025-12-04T09:17:18.4491009Z  * [new branch]              gh/jansel/556/base          -> origin/gh/jansel/556/base
2025-12-04T09:17:18.4493113Z  * [new branch]              gh/jansel/556/head          -> origin/gh/jansel/556/head
2025-12-04T09:17:18.4495241Z  * [new branch]              gh/jansel/556/orig          -> origin/gh/jansel/556/orig
2025-12-04T09:17:18.4498944Z  * [new branch]              gh/jansel/557/base          -> origin/gh/jansel/557/base
2025-12-04T09:17:18.4501629Z  * [new branch]              gh/jansel/557/head          -> origin/gh/jansel/557/head
2025-12-04T09:17:18.4504080Z  * [new branch]              gh/jansel/557/orig          -> origin/gh/jansel/557/orig
2025-12-04T09:17:18.4507378Z  * [new branch]              gh/jansel/558/base          -> origin/gh/jansel/558/base
2025-12-04T09:17:18.4510220Z  * [new branch]              gh/jansel/558/head          -> origin/gh/jansel/558/head
2025-12-04T09:17:18.4512627Z  * [new branch]              gh/jansel/558/orig          -> origin/gh/jansel/558/orig
2025-12-04T09:17:18.4515985Z  * [new branch]              gh/jansel/559/base          -> origin/gh/jansel/559/base
2025-12-04T09:17:18.4518328Z  * [new branch]              gh/jansel/559/head          -> origin/gh/jansel/559/head
2025-12-04T09:17:18.4520804Z  * [new branch]              gh/jansel/559/orig          -> origin/gh/jansel/559/orig
2025-12-04T09:17:18.4524225Z  * [new branch]              gh/jansel/560/base          -> origin/gh/jansel/560/base
2025-12-04T09:17:18.4525964Z  * [new branch]              gh/jansel/560/head          -> origin/gh/jansel/560/head
2025-12-04T09:17:18.4527835Z  * [new branch]              gh/jansel/560/orig          -> origin/gh/jansel/560/orig
2025-12-04T09:17:18.4530310Z  * [new branch]              gh/jansel/561/base          -> origin/gh/jansel/561/base
2025-12-04T09:17:18.4532129Z  * [new branch]              gh/jansel/561/head          -> origin/gh/jansel/561/head
2025-12-04T09:17:18.4533916Z  * [new branch]              gh/jansel/561/orig          -> origin/gh/jansel/561/orig
2025-12-04T09:17:18.4536546Z  * [new branch]              gh/jansel/562/base          -> origin/gh/jansel/562/base
2025-12-04T09:17:18.4538277Z  * [new branch]              gh/jansel/562/head          -> origin/gh/jansel/562/head
2025-12-04T09:17:18.4540362Z  * [new branch]              gh/jansel/562/orig          -> origin/gh/jansel/562/orig
2025-12-04T09:17:18.4542851Z  * [new branch]              gh/jansel/563/base          -> origin/gh/jansel/563/base
2025-12-04T09:17:18.4544715Z  * [new branch]              gh/jansel/563/head          -> origin/gh/jansel/563/head
2025-12-04T09:17:18.4546583Z  * [new branch]              gh/jansel/563/orig          -> origin/gh/jansel/563/orig
2025-12-04T09:17:18.4549587Z  * [new branch]              gh/jansel/564/base          -> origin/gh/jansel/564/base
2025-12-04T09:17:18.4551425Z  * [new branch]              gh/jansel/564/head          -> origin/gh/jansel/564/head
2025-12-04T09:17:18.4553279Z  * [new branch]              gh/jansel/564/orig          -> origin/gh/jansel/564/orig
2025-12-04T09:17:18.4555857Z  * [new branch]              gh/jansel/565/base          -> origin/gh/jansel/565/base
2025-12-04T09:17:18.4557678Z  * [new branch]              gh/jansel/565/head          -> origin/gh/jansel/565/head
2025-12-04T09:17:18.4559556Z  * [new branch]              gh/jansel/565/orig          -> origin/gh/jansel/565/orig
2025-12-04T09:17:18.4562214Z  * [new branch]              gh/jansel/566/base          -> origin/gh/jansel/566/base
2025-12-04T09:17:18.4563993Z  * [new branch]              gh/jansel/566/head          -> origin/gh/jansel/566/head
2025-12-04T09:17:18.4565903Z  * [new branch]              gh/jansel/566/orig          -> origin/gh/jansel/566/orig
2025-12-04T09:17:18.4568480Z  * [new branch]              gh/jansel/567/base          -> origin/gh/jansel/567/base
2025-12-04T09:17:18.4570498Z  * [new branch]              gh/jansel/567/head          -> origin/gh/jansel/567/head
2025-12-04T09:17:18.4572115Z  * [new branch]              gh/jansel/567/orig          -> origin/gh/jansel/567/orig
2025-12-04T09:17:18.4574752Z  * [new branch]              gh/jansel/568/base          -> origin/gh/jansel/568/base
2025-12-04T09:17:18.4576692Z  * [new branch]              gh/jansel/568/head          -> origin/gh/jansel/568/head
2025-12-04T09:17:18.4578480Z  * [new branch]              gh/jansel/568/orig          -> origin/gh/jansel/568/orig
2025-12-04T09:17:18.4581168Z  * [new branch]              gh/jansel/569/base          -> origin/gh/jansel/569/base
2025-12-04T09:17:18.4582941Z  * [new branch]              gh/jansel/569/head          -> origin/gh/jansel/569/head
2025-12-04T09:17:18.4584736Z  * [new branch]              gh/jansel/569/orig          -> origin/gh/jansel/569/orig
2025-12-04T09:17:18.4587298Z  * [new branch]              gh/jansel/570/base          -> origin/gh/jansel/570/base
2025-12-04T09:17:18.4589107Z  * [new branch]              gh/jansel/570/head          -> origin/gh/jansel/570/head
2025-12-04T09:17:18.4591136Z  * [new branch]              gh/jansel/570/orig          -> origin/gh/jansel/570/orig
2025-12-04T09:17:18.4593697Z  * [new branch]              gh/jansel/571/base          -> origin/gh/jansel/571/base
2025-12-04T09:17:18.4595537Z  * [new branch]              gh/jansel/571/head          -> origin/gh/jansel/571/head
2025-12-04T09:17:18.4597334Z  * [new branch]              gh/jansel/571/orig          -> origin/gh/jansel/571/orig
2025-12-04T09:17:18.4599805Z  * [new branch]              gh/jansel/572/base          -> origin/gh/jansel/572/base
2025-12-04T09:17:18.4601669Z  * [new branch]              gh/jansel/572/head          -> origin/gh/jansel/572/head
2025-12-04T09:17:18.4603430Z  * [new branch]              gh/jansel/572/orig          -> origin/gh/jansel/572/orig
2025-12-04T09:17:18.4606037Z  * [new branch]              gh/jansel/573/base          -> origin/gh/jansel/573/base
2025-12-04T09:17:18.4608108Z  * [new branch]              gh/jansel/573/head          -> origin/gh/jansel/573/head
2025-12-04T09:17:18.4610067Z  * [new branch]              gh/jansel/573/orig          -> origin/gh/jansel/573/orig
2025-12-04T09:17:18.4612614Z  * [new branch]              gh/jansel/574/base          -> origin/gh/jansel/574/base
2025-12-04T09:17:18.4614393Z  * [new branch]              gh/jansel/574/head          -> origin/gh/jansel/574/head
2025-12-04T09:17:18.4616374Z  * [new branch]              gh/jansel/574/orig          -> origin/gh/jansel/574/orig
2025-12-04T09:17:18.4619008Z  * [new branch]              gh/jansel/575/base          -> origin/gh/jansel/575/base
2025-12-04T09:17:18.4620874Z  * [new branch]              gh/jansel/575/head          -> origin/gh/jansel/575/head
2025-12-04T09:17:18.4622723Z  * [new branch]              gh/jansel/575/orig          -> origin/gh/jansel/575/orig
2025-12-04T09:17:18.4625355Z  * [new branch]              gh/jansel/576/base          -> origin/gh/jansel/576/base
2025-12-04T09:17:18.4627233Z  * [new branch]              gh/jansel/576/head          -> origin/gh/jansel/576/head
2025-12-04T09:17:18.4629045Z  * [new branch]              gh/jansel/576/orig          -> origin/gh/jansel/576/orig
2025-12-04T09:17:18.4632153Z  * [new branch]              gh/jbschlosser/247/base     -> origin/gh/jbschlosser/247/base
2025-12-04T09:17:18.4634009Z  * [new branch]              gh/jbschlosser/247/head     -> origin/gh/jbschlosser/247/head
2025-12-04T09:17:18.4635829Z  * [new branch]              gh/jbschlosser/247/orig     -> origin/gh/jbschlosser/247/orig
2025-12-04T09:17:18.4638331Z  * [new branch]              gh/jbschlosser/250/base     -> origin/gh/jbschlosser/250/base
2025-12-04T09:17:18.4640215Z  * [new branch]              gh/jbschlosser/250/head     -> origin/gh/jbschlosser/250/head
2025-12-04T09:17:18.4642062Z  * [new branch]              gh/jbschlosser/250/orig     -> origin/gh/jbschlosser/250/orig
2025-12-04T09:17:18.4645374Z  * [new branch]              gh/jerryzh168/1/base        -> origin/gh/jerryzh168/1/base
2025-12-04T09:17:18.4647052Z  * [new branch]              gh/jerryzh168/1/head        -> origin/gh/jerryzh168/1/head
2025-12-04T09:17:18.4648908Z  * [new branch]              gh/jerryzh168/1/orig        -> origin/gh/jerryzh168/1/orig
2025-12-04T09:17:18.4651936Z  * [new branch]              gh/jiayisunx/59/base        -> origin/gh/jiayisunx/59/base
2025-12-04T09:17:18.4653921Z  * [new branch]              gh/jiayisunx/59/head        -> origin/gh/jiayisunx/59/head
2025-12-04T09:17:18.4655553Z  * [new branch]              gh/jiayisunx/59/orig        -> origin/gh/jiayisunx/59/orig
2025-12-04T09:17:18.4658018Z  * [new branch]              gh/jiayisunx/61/base        -> origin/gh/jiayisunx/61/base
2025-12-04T09:17:18.4660012Z  * [new branch]              gh/jiayisunx/61/head        -> origin/gh/jiayisunx/61/head
2025-12-04T09:17:18.4661789Z  * [new branch]              gh/jiayisunx/61/orig        -> origin/gh/jiayisunx/61/orig
2025-12-04T09:17:18.4664399Z  * [new branch]              gh/jiayisunx/68/base        -> origin/gh/jiayisunx/68/base
2025-12-04T09:17:18.4666205Z  * [new branch]              gh/jiayisunx/68/head        -> origin/gh/jiayisunx/68/head
2025-12-04T09:17:18.4668059Z  * [new branch]              gh/jiayisunx/68/orig        -> origin/gh/jiayisunx/68/orig
2025-12-04T09:17:18.4670569Z  * [new branch]              gh/jiayisunx/77/base        -> origin/gh/jiayisunx/77/base
2025-12-04T09:17:18.4672357Z  * [new branch]              gh/jiayisunx/77/head        -> origin/gh/jiayisunx/77/head
2025-12-04T09:17:18.4674161Z  * [new branch]              gh/jiayisunx/77/orig        -> origin/gh/jiayisunx/77/orig
2025-12-04T09:17:18.4676688Z  * [new branch]              gh/jiayisunx/78/base        -> origin/gh/jiayisunx/78/base
2025-12-04T09:17:18.4678522Z  * [new branch]              gh/jiayisunx/78/head        -> origin/gh/jiayisunx/78/head
2025-12-04T09:17:18.4680331Z  * [new branch]              gh/jiayisunx/78/orig        -> origin/gh/jiayisunx/78/orig
2025-12-04T09:17:18.4682857Z  * [new branch]              gh/jiayisunx/79/base        -> origin/gh/jiayisunx/79/base
2025-12-04T09:17:18.4684721Z  * [new branch]              gh/jiayisunx/79/head        -> origin/gh/jiayisunx/79/head
2025-12-04T09:17:18.4686501Z  * [new branch]              gh/jiayisunx/79/orig        -> origin/gh/jiayisunx/79/orig
2025-12-04T09:17:18.4689147Z  * [new branch]              gh/jiayisunx/82/base        -> origin/gh/jiayisunx/82/base
2025-12-04T09:17:18.4690955Z  * [new branch]              gh/jiayisunx/82/head        -> origin/gh/jiayisunx/82/head
2025-12-04T09:17:18.4692804Z  * [new branch]              gh/jiayisunx/82/orig        -> origin/gh/jiayisunx/82/orig
2025-12-04T09:17:18.4695277Z  * [new branch]              gh/jiayisunx/83/base        -> origin/gh/jiayisunx/83/base
2025-12-04T09:17:18.4697099Z  * [new branch]              gh/jiayisunx/83/head        -> origin/gh/jiayisunx/83/head
2025-12-04T09:17:18.4698897Z  * [new branch]              gh/jiayisunx/83/orig        -> origin/gh/jiayisunx/83/orig
2025-12-04T09:17:18.4701510Z  * [new branch]              gh/jiayisunx/84/base        -> origin/gh/jiayisunx/84/base
2025-12-04T09:17:18.4703275Z  * [new branch]              gh/jiayisunx/84/head        -> origin/gh/jiayisunx/84/head
2025-12-04T09:17:18.4705071Z  * [new branch]              gh/jiayisunx/84/orig        -> origin/gh/jiayisunx/84/orig
2025-12-04T09:17:18.4707535Z  * [new branch]              gh/jiayisunx/85/base        -> origin/gh/jiayisunx/85/base
2025-12-04T09:17:18.4709544Z  * [new branch]              gh/jiayisunx/85/head        -> origin/gh/jiayisunx/85/head
2025-12-04T09:17:18.4711321Z  * [new branch]              gh/jiayisunx/85/orig        -> origin/gh/jiayisunx/85/orig
2025-12-04T09:17:18.4713868Z  * [new branch]              gh/jiayisunx/86/base        -> origin/gh/jiayisunx/86/base
2025-12-04T09:17:18.4721409Z  * [new branch]              gh/jiayisunx/86/head        -> origin/gh/jiayisunx/86/head
2025-12-04T09:17:18.4722238Z  * [new branch]              gh/jiayisunx/86/orig        -> origin/gh/jiayisunx/86/orig
2025-12-04T09:17:18.4722792Z  * [new branch]              gh/jiayisunx/87/base        -> origin/gh/jiayisunx/87/base
2025-12-04T09:17:18.4723330Z  * [new branch]              gh/jiayisunx/87/head        -> origin/gh/jiayisunx/87/head
2025-12-04T09:17:18.4723984Z  * [new branch]              gh/jiayisunx/87/orig        -> origin/gh/jiayisunx/87/orig
2025-12-04T09:17:18.4726374Z  * [new branch]              gh/jiayisunx/88/base        -> origin/gh/jiayisunx/88/base
2025-12-04T09:17:18.4728091Z  * [new branch]              gh/jiayisunx/88/head        -> origin/gh/jiayisunx/88/head
2025-12-04T09:17:18.4729926Z  * [new branch]              gh/jiayisunx/88/orig        -> origin/gh/jiayisunx/88/orig
2025-12-04T09:17:18.4732410Z  * [new branch]              gh/jiayisunx/89/base        -> origin/gh/jiayisunx/89/base
2025-12-04T09:17:18.4734258Z  * [new branch]              gh/jiayisunx/89/head        -> origin/gh/jiayisunx/89/head
2025-12-04T09:17:18.4736128Z  * [new branch]              gh/jiayisunx/89/orig        -> origin/gh/jiayisunx/89/orig
2025-12-04T09:17:18.4738887Z  * [new branch]              gh/jiayisunx/90/base        -> origin/gh/jiayisunx/90/base
2025-12-04T09:17:18.4740945Z  * [new branch]              gh/jiayisunx/90/head        -> origin/gh/jiayisunx/90/head
2025-12-04T09:17:18.4742765Z  * [new branch]              gh/jiayisunx/90/orig        -> origin/gh/jiayisunx/90/orig
2025-12-04T09:17:18.4745640Z  * [new branch]              gh/jjwu@meta.com/1/base     -> origin/gh/jjwu@meta.com/1/base
2025-12-04T09:17:18.4747437Z  * [new branch]              gh/jjwu@meta.com/1/head     -> origin/gh/jjwu@meta.com/1/head
2025-12-04T09:17:18.4750403Z  * [new branch]              gh/jturney/1/base           -> origin/gh/jturney/1/base
2025-12-04T09:17:18.4752273Z  * [new branch]              gh/jturney/1/head           -> origin/gh/jturney/1/head
2025-12-04T09:17:18.4754069Z  * [new branch]              gh/jturney/1/orig           -> origin/gh/jturney/1/orig
2025-12-04T09:17:18.4756528Z  * [new branch]              gh/jturney/2/base           -> origin/gh/jturney/2/base
2025-12-04T09:17:18.4758348Z  * [new branch]              gh/jturney/2/head           -> origin/gh/jturney/2/head
2025-12-04T09:17:18.4760326Z  * [new branch]              gh/jturney/2/orig           -> origin/gh/jturney/2/orig
2025-12-04T09:17:18.4763527Z  * [new branch]              gh/karthickai/10/base       -> origin/gh/karthickai/10/base
2025-12-04T09:17:18.4765449Z  * [new branch]              gh/karthickai/10/head       -> origin/gh/karthickai/10/head
2025-12-04T09:17:18.4767284Z  * [new branch]              gh/karthickai/10/orig       -> origin/gh/karthickai/10/orig
2025-12-04T09:17:18.4769827Z  * [new branch]              gh/karthickai/11/base       -> origin/gh/karthickai/11/base
2025-12-04T09:17:18.4771718Z  * [new branch]              gh/karthickai/11/head       -> origin/gh/karthickai/11/head
2025-12-04T09:17:18.4773560Z  * [new branch]              gh/karthickai/11/orig       -> origin/gh/karthickai/11/orig
2025-12-04T09:17:18.4776416Z  * [new branch]              gh/karthickai/12/base       -> origin/gh/karthickai/12/base
2025-12-04T09:17:18.4778303Z  * [new branch]              gh/karthickai/12/head       -> origin/gh/karthickai/12/head
2025-12-04T09:17:18.4780303Z  * [new branch]              gh/karthickai/12/orig       -> origin/gh/karthickai/12/orig
2025-12-04T09:17:18.4782835Z  * [new branch]              gh/karthickai/13/base       -> origin/gh/karthickai/13/base
2025-12-04T09:17:18.4784841Z  * [new branch]              gh/karthickai/13/head       -> origin/gh/karthickai/13/head
2025-12-04T09:17:18.4786672Z  * [new branch]              gh/karthickai/13/orig       -> origin/gh/karthickai/13/orig
2025-12-04T09:17:18.4789387Z  * [new branch]              gh/karthickai/14/base       -> origin/gh/karthickai/14/base
2025-12-04T09:17:18.4791932Z  * [new branch]              gh/karthickai/14/head       -> origin/gh/karthickai/14/head
2025-12-04T09:17:18.4793918Z  * [new branch]              gh/karthickai/14/orig       -> origin/gh/karthickai/14/orig
2025-12-04T09:17:18.4796583Z  * [new branch]              gh/karthickai/15/base       -> origin/gh/karthickai/15/base
2025-12-04T09:17:18.4798433Z  * [new branch]              gh/karthickai/15/head       -> origin/gh/karthickai/15/head
2025-12-04T09:17:18.4800221Z  * [new branch]              gh/karthickai/15/orig       -> origin/gh/karthickai/15/orig
2025-12-04T09:17:18.4802674Z  * [new branch]              gh/karthickai/16/base       -> origin/gh/karthickai/16/base
2025-12-04T09:17:18.4804561Z  * [new branch]              gh/karthickai/16/head       -> origin/gh/karthickai/16/head
2025-12-04T09:17:18.4806446Z  * [new branch]              gh/karthickai/16/orig       -> origin/gh/karthickai/16/orig
2025-12-04T09:17:18.4808843Z  * [new branch]              gh/karthickai/17/base       -> origin/gh/karthickai/17/base
2025-12-04T09:17:18.4813177Z  * [new branch]              gh/karthickai/17/head       -> origin/gh/karthickai/17/head
2025-12-04T09:17:18.4814981Z  * [new branch]              gh/karthickai/17/orig       -> origin/gh/karthickai/17/orig
2025-12-04T09:17:18.4817814Z  * [new branch]              gh/karthickai/18/base       -> origin/gh/karthickai/18/base
2025-12-04T09:17:18.4820025Z  * [new branch]              gh/karthickai/18/head       -> origin/gh/karthickai/18/head
2025-12-04T09:17:18.4821954Z  * [new branch]              gh/karthickai/18/orig       -> origin/gh/karthickai/18/orig
2025-12-04T09:17:18.4824917Z  * [new branch]              gh/karthickai/19/base       -> origin/gh/karthickai/19/base
2025-12-04T09:17:18.4826773Z  * [new branch]              gh/karthickai/19/head       -> origin/gh/karthickai/19/head
2025-12-04T09:17:18.4828586Z  * [new branch]              gh/karthickai/19/orig       -> origin/gh/karthickai/19/orig
2025-12-04T09:17:18.4832019Z  * [new branch]              gh/karthickai/20/base       -> origin/gh/karthickai/20/base
2025-12-04T09:17:18.4834533Z  * [new branch]              gh/karthickai/20/head       -> origin/gh/karthickai/20/head
2025-12-04T09:17:18.4836428Z  * [new branch]              gh/karthickai/20/orig       -> origin/gh/karthickai/20/orig
2025-12-04T09:17:18.4838985Z  * [new branch]              gh/karthickai/21/base       -> origin/gh/karthickai/21/base
2025-12-04T09:17:18.4841193Z  * [new branch]              gh/karthickai/21/head       -> origin/gh/karthickai/21/head
2025-12-04T09:17:18.4843036Z  * [new branch]              gh/karthickai/21/orig       -> origin/gh/karthickai/21/orig
2025-12-04T09:17:18.4845701Z  * [new branch]              gh/karthickai/22/base       -> origin/gh/karthickai/22/base
2025-12-04T09:17:18.4847455Z  * [new branch]              gh/karthickai/22/head       -> origin/gh/karthickai/22/head
2025-12-04T09:17:18.4849382Z  * [new branch]              gh/karthickai/22/orig       -> origin/gh/karthickai/22/orig
2025-12-04T09:17:18.4852040Z  * [new branch]              gh/karthickai/23/base       -> origin/gh/karthickai/23/base
2025-12-04T09:17:18.4853972Z  * [new branch]              gh/karthickai/23/head       -> origin/gh/karthickai/23/head
2025-12-04T09:17:18.4856355Z  * [new branch]              gh/karthickai/23/orig       -> origin/gh/karthickai/23/orig
2025-12-04T09:17:18.4859026Z  * [new branch]              gh/karthickai/24/base       -> origin/gh/karthickai/24/base
2025-12-04T09:17:18.4860921Z  * [new branch]              gh/karthickai/24/head       -> origin/gh/karthickai/24/head
2025-12-04T09:17:18.4862726Z  * [new branch]              gh/karthickai/24/orig       -> origin/gh/karthickai/24/orig
2025-12-04T09:17:18.4865774Z  * [new branch]              gh/karthickai/25/base       -> origin/gh/karthickai/25/base
2025-12-04T09:17:18.4867755Z  * [new branch]              gh/karthickai/25/head       -> origin/gh/karthickai/25/head
2025-12-04T09:17:18.4869579Z  * [new branch]              gh/karthickai/25/orig       -> origin/gh/karthickai/25/orig
2025-12-04T09:17:18.4872041Z  * [new branch]              gh/karthickai/26/base       -> origin/gh/karthickai/26/base
2025-12-04T09:17:18.4874151Z  * [new branch]              gh/karthickai/26/head       -> origin/gh/karthickai/26/head
2025-12-04T09:17:18.4875810Z  * [new branch]              gh/karthickai/26/orig       -> origin/gh/karthickai/26/orig
2025-12-04T09:17:18.4879645Z  * [new branch]              gh/karthickai/6/base        -> origin/gh/karthickai/6/base
2025-12-04T09:17:18.4882039Z  * [new branch]              gh/karthickai/6/head        -> origin/gh/karthickai/6/head
2025-12-04T09:17:18.4883857Z  * [new branch]              gh/karthickai/6/orig        -> origin/gh/karthickai/6/orig
2025-12-04T09:17:18.4886963Z  * [new branch]              gh/krocki/1/base            -> origin/gh/krocki/1/base
2025-12-04T09:17:18.4888869Z  * [new branch]              gh/krocki/1/head            -> origin/gh/krocki/1/head
2025-12-04T09:17:18.4890670Z  * [new branch]              gh/krocki/1/orig            -> origin/gh/krocki/1/orig
2025-12-04T09:17:18.4893277Z  * [new branch]              gh/krocki/2/base            -> origin/gh/krocki/2/base
2025-12-04T09:17:18.4895160Z  * [new branch]              gh/krocki/2/head            -> origin/gh/krocki/2/head
2025-12-04T09:17:18.4897477Z  * [new branch]              gh/krocki/2/orig            -> origin/gh/krocki/2/orig
2025-12-04T09:17:18.4900795Z  * [new branch]              gh/kurtamohler/60/base      -> origin/gh/kurtamohler/60/base
2025-12-04T09:17:18.4902652Z  * [new branch]              gh/kurtamohler/60/head      -> origin/gh/kurtamohler/60/head
2025-12-04T09:17:18.4904454Z  * [new branch]              gh/kurtamohler/60/orig      -> origin/gh/kurtamohler/60/orig
2025-12-04T09:17:18.4906969Z  * [new branch]              gh/kurtamohler/61/base      -> origin/gh/kurtamohler/61/base
2025-12-04T09:17:18.4908799Z  * [new branch]              gh/kurtamohler/61/head      -> origin/gh/kurtamohler/61/head
2025-12-04T09:17:18.4910834Z  * [new branch]              gh/kurtamohler/61/orig      -> origin/gh/kurtamohler/61/orig
2025-12-04T09:17:18.4913494Z  * [new branch]              gh/kurtamohler/62/base      -> origin/gh/kurtamohler/62/base
2025-12-04T09:17:18.4915579Z  * [new branch]              gh/kurtamohler/62/head      -> origin/gh/kurtamohler/62/head
2025-12-04T09:17:18.4917426Z  * [new branch]              gh/kurtamohler/62/orig      -> origin/gh/kurtamohler/62/orig
2025-12-04T09:17:18.4920094Z  * [new branch]              gh/kurtamohler/63/base      -> origin/gh/kurtamohler/63/base
2025-12-04T09:17:18.4921901Z  * [new branch]              gh/kurtamohler/63/head      -> origin/gh/kurtamohler/63/head
2025-12-04T09:17:18.4923722Z  * [new branch]              gh/kurtamohler/63/orig      -> origin/gh/kurtamohler/63/orig
2025-12-04T09:17:18.4926248Z  * [new branch]              gh/kurtamohler/64/base      -> origin/gh/kurtamohler/64/base
2025-12-04T09:17:18.4928021Z  * [new branch]              gh/kurtamohler/64/head      -> origin/gh/kurtamohler/64/head
2025-12-04T09:17:18.4929834Z  * [new branch]              gh/kurtamohler/64/orig      -> origin/gh/kurtamohler/64/orig
2025-12-04T09:17:18.4932315Z  * [new branch]              gh/kurtamohler/65/base      -> origin/gh/kurtamohler/65/base
2025-12-04T09:17:18.4934162Z  * [new branch]              gh/kurtamohler/65/head      -> origin/gh/kurtamohler/65/head
2025-12-04T09:17:18.4936071Z  * [new branch]              gh/kurtamohler/65/orig      -> origin/gh/kurtamohler/65/orig
2025-12-04T09:17:18.4938448Z  * [new branch]              gh/kurtamohler/66/base      -> origin/gh/kurtamohler/66/base
2025-12-04T09:17:18.4940483Z  * [new branch]              gh/kurtamohler/66/head      -> origin/gh/kurtamohler/66/head
2025-12-04T09:17:18.4942283Z  * [new branch]              gh/kurtamohler/66/orig      -> origin/gh/kurtamohler/66/orig
2025-12-04T09:17:18.4945043Z  * [new branch]              gh/kurtamohler/67/base      -> origin/gh/kurtamohler/67/base
2025-12-04T09:17:18.4946814Z  * [new branch]              gh/kurtamohler/67/head      -> origin/gh/kurtamohler/67/head
2025-12-04T09:17:18.4948840Z  * [new branch]              gh/kurtamohler/67/orig      -> origin/gh/kurtamohler/67/orig
2025-12-04T09:17:18.4952057Z  * [new branch]              gh/kwen2501/130/base        -> origin/gh/kwen2501/130/base
2025-12-04T09:17:18.4954380Z  * [new branch]              gh/kwen2501/130/head        -> origin/gh/kwen2501/130/head
2025-12-04T09:17:18.4956209Z  * [new branch]              gh/kwen2501/130/orig        -> origin/gh/kwen2501/130/orig
2025-12-04T09:17:18.4958781Z  * [new branch]              gh/kwen2501/170/base        -> origin/gh/kwen2501/170/base
2025-12-04T09:17:18.4960568Z  * [new branch]              gh/kwen2501/170/head        -> origin/gh/kwen2501/170/head
2025-12-04T09:17:18.4963175Z  * [new branch]              gh/kwen2501/187/base        -> origin/gh/kwen2501/187/base
2025-12-04T09:17:18.4965036Z  * [new branch]              gh/kwen2501/187/head        -> origin/gh/kwen2501/187/head
2025-12-04T09:17:18.4966860Z  * [new branch]              gh/kwen2501/187/orig        -> origin/gh/kwen2501/187/orig
2025-12-04T09:17:18.4969477Z  * [new branch]              gh/kwen2501/188/base        -> origin/gh/kwen2501/188/base
2025-12-04T09:17:18.4971323Z  * [new branch]              gh/kwen2501/188/head        -> origin/gh/kwen2501/188/head
2025-12-04T09:17:18.4973073Z  * [new branch]              gh/kwen2501/188/orig        -> origin/gh/kwen2501/188/orig
2025-12-04T09:17:18.4975558Z  * [new branch]              gh/kwen2501/211/base        -> origin/gh/kwen2501/211/base
2025-12-04T09:17:18.4977433Z  * [new branch]              gh/kwen2501/211/head        -> origin/gh/kwen2501/211/head
2025-12-04T09:17:18.4980019Z  * [new branch]              gh/kwen2501/224/base        -> origin/gh/kwen2501/224/base
2025-12-04T09:17:18.4981821Z  * [new branch]              gh/kwen2501/224/head        -> origin/gh/kwen2501/224/head
2025-12-04T09:17:18.4983592Z  * [new branch]              gh/kwen2501/224/orig        -> origin/gh/kwen2501/224/orig
2025-12-04T09:17:18.4986130Z  * [new branch]              gh/kwen2501/228/base        -> origin/gh/kwen2501/228/base
2025-12-04T09:17:18.4987922Z  * [new branch]              gh/kwen2501/228/head        -> origin/gh/kwen2501/228/head
2025-12-04T09:17:18.4989938Z  * [new branch]              gh/kwen2501/228/orig        -> origin/gh/kwen2501/228/orig
2025-12-04T09:17:18.4992688Z  * [new branch]              gh/kwen2501/234/base        -> origin/gh/kwen2501/234/base
2025-12-04T09:17:18.4994587Z  * [new branch]              gh/kwen2501/234/head        -> origin/gh/kwen2501/234/head
2025-12-04T09:17:18.4996347Z  * [new branch]              gh/kwen2501/234/orig        -> origin/gh/kwen2501/234/orig
2025-12-04T09:17:18.4998832Z  * [new branch]              gh/kwen2501/235/base        -> origin/gh/kwen2501/235/base
2025-12-04T09:17:18.5000661Z  * [new branch]              gh/kwen2501/235/head        -> origin/gh/kwen2501/235/head
2025-12-04T09:17:18.5002474Z  * [new branch]              gh/kwen2501/235/orig        -> origin/gh/kwen2501/235/orig
2025-12-04T09:17:18.5004941Z  * [new branch]              gh/kwen2501/236/base        -> origin/gh/kwen2501/236/base
2025-12-04T09:17:18.5006756Z  * [new branch]              gh/kwen2501/236/head        -> origin/gh/kwen2501/236/head
2025-12-04T09:17:18.5008853Z  * [new branch]              gh/kwen2501/236/orig        -> origin/gh/kwen2501/236/orig
2025-12-04T09:17:18.5011283Z  * [new branch]              gh/kwen2501/237/base        -> origin/gh/kwen2501/237/base
2025-12-04T09:17:18.5013139Z  * [new branch]              gh/kwen2501/237/head        -> origin/gh/kwen2501/237/head
2025-12-04T09:17:18.5015010Z  * [new branch]              gh/kwen2501/237/orig        -> origin/gh/kwen2501/237/orig
2025-12-04T09:17:18.5017576Z  * [new branch]              gh/kwen2501/238/base        -> origin/gh/kwen2501/238/base
2025-12-04T09:17:18.5019469Z  * [new branch]              gh/kwen2501/238/head        -> origin/gh/kwen2501/238/head
2025-12-04T09:17:18.5021433Z  * [new branch]              gh/kwen2501/238/orig        -> origin/gh/kwen2501/238/orig
2025-12-04T09:17:18.5024264Z  * [new branch]              gh/kwen2501/240/base        -> origin/gh/kwen2501/240/base
2025-12-04T09:17:18.5025703Z  * [new branch]              gh/kwen2501/240/head        -> origin/gh/kwen2501/240/head
2025-12-04T09:17:18.5027478Z  * [new branch]              gh/kwen2501/240/orig        -> origin/gh/kwen2501/240/orig
2025-12-04T09:17:18.5029921Z  * [new branch]              gh/kwen2501/241/base        -> origin/gh/kwen2501/241/base
2025-12-04T09:17:18.5031749Z  * [new branch]              gh/kwen2501/241/head        -> origin/gh/kwen2501/241/head
2025-12-04T09:17:18.5033508Z  * [new branch]              gh/kwen2501/241/orig        -> origin/gh/kwen2501/241/orig
2025-12-04T09:17:18.5036008Z  * [new branch]              gh/kwen2501/247/base        -> origin/gh/kwen2501/247/base
2025-12-04T09:17:18.5037817Z  * [new branch]              gh/kwen2501/247/head        -> origin/gh/kwen2501/247/head
2025-12-04T09:17:18.5039651Z  * [new branch]              gh/kwen2501/247/orig        -> origin/gh/kwen2501/247/orig
2025-12-04T09:17:18.5042239Z  * [new branch]              gh/kwen2501/252/base        -> origin/gh/kwen2501/252/base
2025-12-04T09:17:18.5044077Z  * [new branch]              gh/kwen2501/252/head        -> origin/gh/kwen2501/252/head
2025-12-04T09:17:18.5045845Z  * [new branch]              gh/kwen2501/252/orig        -> origin/gh/kwen2501/252/orig
2025-12-04T09:17:18.5048903Z  * [new branch]              gh/kwen2501/259/base        -> origin/gh/kwen2501/259/base
2025-12-04T09:17:18.5050788Z  * [new branch]              gh/kwen2501/259/head        -> origin/gh/kwen2501/259/head
2025-12-04T09:17:18.5052684Z  * [new branch]              gh/kwen2501/259/orig        -> origin/gh/kwen2501/259/orig
2025-12-04T09:17:18.5055340Z  * [new branch]              gh/kwen2501/260/base        -> origin/gh/kwen2501/260/base
2025-12-04T09:17:18.5057281Z  * [new branch]              gh/kwen2501/260/head        -> origin/gh/kwen2501/260/head
2025-12-04T09:17:18.5059175Z  * [new branch]              gh/kwen2501/260/orig        -> origin/gh/kwen2501/260/orig
2025-12-04T09:17:18.5061799Z  * [new branch]              gh/kwen2501/268/base        -> origin/gh/kwen2501/268/base
2025-12-04T09:17:18.5063611Z  * [new branch]              gh/kwen2501/268/head        -> origin/gh/kwen2501/268/head
2025-12-04T09:17:18.5065445Z  * [new branch]              gh/kwen2501/268/orig        -> origin/gh/kwen2501/268/orig
2025-12-04T09:17:18.5068204Z  * [new branch]              gh/kwen2501/269/base        -> origin/gh/kwen2501/269/base
2025-12-04T09:17:18.5070140Z  * [new branch]              gh/kwen2501/269/head        -> origin/gh/kwen2501/269/head
2025-12-04T09:17:18.5071917Z  * [new branch]              gh/kwen2501/269/orig        -> origin/gh/kwen2501/269/orig
2025-12-04T09:17:18.5074582Z  * [new branch]              gh/kwen2501/270/base        -> origin/gh/kwen2501/270/base
2025-12-04T09:17:18.5076550Z  * [new branch]              gh/kwen2501/270/head        -> origin/gh/kwen2501/270/head
2025-12-04T09:17:18.5078397Z  * [new branch]              gh/kwen2501/270/orig        -> origin/gh/kwen2501/270/orig
2025-12-04T09:17:18.5081072Z  * [new branch]              gh/kwen2501/271/base        -> origin/gh/kwen2501/271/base
2025-12-04T09:17:18.5082880Z  * [new branch]              gh/kwen2501/271/head        -> origin/gh/kwen2501/271/head
2025-12-04T09:17:18.5084710Z  * [new branch]              gh/kwen2501/271/orig        -> origin/gh/kwen2501/271/orig
2025-12-04T09:17:18.5088006Z  * [new branch]              gh/kwen2501/274/base        -> origin/gh/kwen2501/274/base
2025-12-04T09:17:18.5089953Z  * [new branch]              gh/kwen2501/274/head        -> origin/gh/kwen2501/274/head
2025-12-04T09:17:18.5091786Z  * [new branch]              gh/kwen2501/274/orig        -> origin/gh/kwen2501/274/orig
2025-12-04T09:17:18.5094599Z  * [new branch]              gh/kwen2501/275/base        -> origin/gh/kwen2501/275/base
2025-12-04T09:17:18.5096567Z  * [new branch]              gh/kwen2501/275/head        -> origin/gh/kwen2501/275/head
2025-12-04T09:17:18.5098468Z  * [new branch]              gh/kwen2501/275/orig        -> origin/gh/kwen2501/275/orig
2025-12-04T09:17:18.5101194Z  * [new branch]              gh/kwen2501/276/base        -> origin/gh/kwen2501/276/base
2025-12-04T09:17:18.5102986Z  * [new branch]              gh/kwen2501/276/head        -> origin/gh/kwen2501/276/head
2025-12-04T09:17:18.5104759Z  * [new branch]              gh/kwen2501/276/orig        -> origin/gh/kwen2501/276/orig
2025-12-04T09:17:18.5108119Z  * [new branch]              gh/kwen2501/277/base        -> origin/gh/kwen2501/277/base
2025-12-04T09:17:18.5109960Z  * [new branch]              gh/kwen2501/277/head        -> origin/gh/kwen2501/277/head
2025-12-04T09:17:18.5111686Z  * [new branch]              gh/kwen2501/277/orig        -> origin/gh/kwen2501/277/orig
2025-12-04T09:17:18.5114753Z  * [new branch]              gh/kwen2501/278/base        -> origin/gh/kwen2501/278/base
2025-12-04T09:17:18.5116574Z  * [new branch]              gh/kwen2501/278/head        -> origin/gh/kwen2501/278/head
2025-12-04T09:17:18.5118408Z  * [new branch]              gh/kwen2501/278/orig        -> origin/gh/kwen2501/278/orig
2025-12-04T09:17:18.5121642Z  * [new branch]              gh/kwen2501/279/base        -> origin/gh/kwen2501/279/base
2025-12-04T09:17:18.5123607Z  * [new branch]              gh/kwen2501/279/head        -> origin/gh/kwen2501/279/head
2025-12-04T09:17:18.5125517Z  * [new branch]              gh/kwen2501/279/orig        -> origin/gh/kwen2501/279/orig
2025-12-04T09:17:18.5128152Z  * [new branch]              gh/kwen2501/280/base        -> origin/gh/kwen2501/280/base
2025-12-04T09:17:18.5129991Z  * [new branch]              gh/kwen2501/280/head        -> origin/gh/kwen2501/280/head
2025-12-04T09:17:18.5131885Z  * [new branch]              gh/kwen2501/280/orig        -> origin/gh/kwen2501/280/orig
2025-12-04T09:17:18.5134433Z  * [new branch]              gh/kwen2501/281/base        -> origin/gh/kwen2501/281/base
2025-12-04T09:17:18.5136323Z  * [new branch]              gh/kwen2501/281/head        -> origin/gh/kwen2501/281/head
2025-12-04T09:17:18.5138198Z  * [new branch]              gh/kwen2501/281/orig        -> origin/gh/kwen2501/281/orig
2025-12-04T09:17:18.5140963Z  * [new branch]              gh/kwen2501/282/base        -> origin/gh/kwen2501/282/base
2025-12-04T09:17:18.5142821Z  * [new branch]              gh/kwen2501/282/head        -> origin/gh/kwen2501/282/head
2025-12-04T09:17:18.5144617Z  * [new branch]              gh/kwen2501/282/orig        -> origin/gh/kwen2501/282/orig
2025-12-04T09:17:18.5147286Z  * [new branch]              gh/kwen2501/283/base        -> origin/gh/kwen2501/283/base
2025-12-04T09:17:18.5149131Z  * [new branch]              gh/kwen2501/283/head        -> origin/gh/kwen2501/283/head
2025-12-04T09:17:18.5151260Z  * [new branch]              gh/kwen2501/283/orig        -> origin/gh/kwen2501/283/orig
2025-12-04T09:17:18.5153888Z  * [new branch]              gh/kwen2501/284/base        -> origin/gh/kwen2501/284/base
2025-12-04T09:17:18.5155839Z  * [new branch]              gh/kwen2501/284/head        -> origin/gh/kwen2501/284/head
2025-12-04T09:17:18.5157666Z  * [new branch]              gh/kwen2501/284/orig        -> origin/gh/kwen2501/284/orig
2025-12-04T09:17:18.5160197Z  * [new branch]              gh/kwen2501/285/base        -> origin/gh/kwen2501/285/base
2025-12-04T09:17:18.5162021Z  * [new branch]              gh/kwen2501/285/head        -> origin/gh/kwen2501/285/head
2025-12-04T09:17:18.5163849Z  * [new branch]              gh/kwen2501/285/orig        -> origin/gh/kwen2501/285/orig
2025-12-04T09:17:18.5166414Z  * [new branch]              gh/kwen2501/286/base        -> origin/gh/kwen2501/286/base
2025-12-04T09:17:18.5168297Z  * [new branch]              gh/kwen2501/286/head        -> origin/gh/kwen2501/286/head
2025-12-04T09:17:18.5170167Z  * [new branch]              gh/kwen2501/286/orig        -> origin/gh/kwen2501/286/orig
2025-12-04T09:17:18.5172671Z  * [new branch]              gh/kwen2501/287/base        -> origin/gh/kwen2501/287/base
2025-12-04T09:17:18.5174661Z  * [new branch]              gh/kwen2501/287/head        -> origin/gh/kwen2501/287/head
2025-12-04T09:17:18.5176353Z  * [new branch]              gh/kwen2501/287/orig        -> origin/gh/kwen2501/287/orig
2025-12-04T09:17:18.5179094Z  * [new branch]              gh/kwen2501/288/base        -> origin/gh/kwen2501/288/base
2025-12-04T09:17:18.5182205Z  * [new branch]              gh/kwen2501/288/head        -> origin/gh/kwen2501/288/head
2025-12-04T09:17:18.5183649Z  * [new branch]              gh/kwen2501/288/orig        -> origin/gh/kwen2501/288/orig
2025-12-04T09:17:18.5186694Z  * [new branch]              gh/laithsakka/251/base      -> origin/gh/laithsakka/251/base
2025-12-04T09:17:18.5188542Z  * [new branch]              gh/laithsakka/251/head      -> origin/gh/laithsakka/251/head
2025-12-04T09:17:18.5190326Z  * [new branch]              gh/laithsakka/251/orig      -> origin/gh/laithsakka/251/orig
2025-12-04T09:17:18.5192803Z  * [new branch]              gh/laithsakka/276/base      -> origin/gh/laithsakka/276/base
2025-12-04T09:17:18.5194618Z  * [new branch]              gh/laithsakka/276/head      -> origin/gh/laithsakka/276/head
2025-12-04T09:17:18.5196405Z  * [new branch]              gh/laithsakka/276/orig      -> origin/gh/laithsakka/276/orig
2025-12-04T09:17:18.5199266Z  * [new branch]              gh/laithsakka/28/base       -> origin/gh/laithsakka/28/base
2025-12-04T09:17:18.5201634Z  * [new branch]              gh/laithsakka/29/base       -> origin/gh/laithsakka/29/base
2025-12-04T09:17:18.5203978Z  * [new branch]              gh/laithsakka/30/base       -> origin/gh/laithsakka/30/base
2025-12-04T09:17:18.5205813Z  * [new branch]              gh/laithsakka/30/head       -> origin/gh/laithsakka/30/head
2025-12-04T09:17:18.5208314Z  * [new branch]              gh/laithsakka/31/base       -> origin/gh/laithsakka/31/base
2025-12-04T09:17:18.5212796Z  * [new branch]              gh/laithsakka/31/head       -> origin/gh/laithsakka/31/head
2025-12-04T09:17:18.5215429Z  * [new branch]              gh/laithsakka/313/base      -> origin/gh/laithsakka/313/base
2025-12-04T09:17:18.5217309Z  * [new branch]              gh/laithsakka/313/head      -> origin/gh/laithsakka/313/head
2025-12-04T09:17:18.5219199Z  * [new branch]              gh/laithsakka/313/orig      -> origin/gh/laithsakka/313/orig
2025-12-04T09:17:18.5222013Z  * [new branch]              gh/laithsakka/316/base      -> origin/gh/laithsakka/316/base
2025-12-04T09:17:18.5223866Z  * [new branch]              gh/laithsakka/316/head      -> origin/gh/laithsakka/316/head
2025-12-04T09:17:18.5225699Z  * [new branch]              gh/laithsakka/316/orig      -> origin/gh/laithsakka/316/orig
2025-12-04T09:17:18.5228207Z  * [new branch]              gh/laithsakka/317/base      -> origin/gh/laithsakka/317/base
2025-12-04T09:17:18.5229938Z  * [new branch]              gh/laithsakka/317/head      -> origin/gh/laithsakka/317/head
2025-12-04T09:17:18.5231683Z  * [new branch]              gh/laithsakka/317/orig      -> origin/gh/laithsakka/317/orig
2025-12-04T09:17:18.5234347Z  * [new branch]              gh/laithsakka/319/base      -> origin/gh/laithsakka/319/base
2025-12-04T09:17:18.5236162Z  * [new branch]              gh/laithsakka/319/head      -> origin/gh/laithsakka/319/head
2025-12-04T09:17:18.5237966Z  * [new branch]              gh/laithsakka/319/orig      -> origin/gh/laithsakka/319/orig
2025-12-04T09:17:18.5240498Z  * [new branch]              gh/laithsakka/32/base       -> origin/gh/laithsakka/32/base
2025-12-04T09:17:18.5242384Z  * [new branch]              gh/laithsakka/32/head       -> origin/gh/laithsakka/32/head
2025-12-04T09:17:18.5244975Z  * [new branch]              gh/laithsakka/320/base      -> origin/gh/laithsakka/320/base
2025-12-04T09:17:18.5246812Z  * [new branch]              gh/laithsakka/320/head      -> origin/gh/laithsakka/320/head
2025-12-04T09:17:18.5248823Z  * [new branch]              gh/laithsakka/320/orig      -> origin/gh/laithsakka/320/orig
2025-12-04T09:17:18.5251362Z  * [new branch]              gh/laithsakka/321/base      -> origin/gh/laithsakka/321/base
2025-12-04T09:17:18.5253348Z  * [new branch]              gh/laithsakka/321/head      -> origin/gh/laithsakka/321/head
2025-12-04T09:17:18.5255089Z  * [new branch]              gh/laithsakka/321/orig      -> origin/gh/laithsakka/321/orig
2025-12-04T09:17:18.5257829Z  * [new branch]              gh/laithsakka/322/base      -> origin/gh/laithsakka/322/base
2025-12-04T09:17:18.5259855Z  * [new branch]              gh/laithsakka/322/head      -> origin/gh/laithsakka/322/head
2025-12-04T09:17:18.5262172Z  * [new branch]              gh/laithsakka/322/orig      -> origin/gh/laithsakka/322/orig
2025-12-04T09:17:18.5264964Z  * [new branch]              gh/laithsakka/323/base      -> origin/gh/laithsakka/323/base
2025-12-04T09:17:18.5266851Z  * [new branch]              gh/laithsakka/323/head      -> origin/gh/laithsakka/323/head
2025-12-04T09:17:18.5269140Z  * [new branch]              gh/laithsakka/323/orig      -> origin/gh/laithsakka/323/orig
2025-12-04T09:17:18.5271701Z  * [new branch]              gh/laithsakka/324/base      -> origin/gh/laithsakka/324/base
2025-12-04T09:17:18.5273466Z  * [new branch]              gh/laithsakka/324/head      -> origin/gh/laithsakka/324/head
2025-12-04T09:17:18.5275464Z  * [new branch]              gh/laithsakka/324/orig      -> origin/gh/laithsakka/324/orig
2025-12-04T09:17:18.5278026Z  * [new branch]              gh/laithsakka/325/base      -> origin/gh/laithsakka/325/base
2025-12-04T09:17:18.5279864Z  * [new branch]              gh/laithsakka/325/head      -> origin/gh/laithsakka/325/head
2025-12-04T09:17:18.5282188Z  * [new branch]              gh/laithsakka/325/orig      -> origin/gh/laithsakka/325/orig
2025-12-04T09:17:18.5285143Z  * [new branch]              gh/laithsakka/326/base      -> origin/gh/laithsakka/326/base
2025-12-04T09:17:18.5287024Z  * [new branch]              gh/laithsakka/326/head      -> origin/gh/laithsakka/326/head
2025-12-04T09:17:18.5288914Z  * [new branch]              gh/laithsakka/326/orig      -> origin/gh/laithsakka/326/orig
2025-12-04T09:17:18.5291518Z  * [new branch]              gh/laithsakka/327/base      -> origin/gh/laithsakka/327/base
2025-12-04T09:17:18.5293441Z  * [new branch]              gh/laithsakka/327/head      -> origin/gh/laithsakka/327/head
2025-12-04T09:17:18.5295283Z  * [new branch]              gh/laithsakka/327/orig      -> origin/gh/laithsakka/327/orig
2025-12-04T09:17:18.5297825Z  * [new branch]              gh/laithsakka/328/base      -> origin/gh/laithsakka/328/base
2025-12-04T09:17:18.5299762Z  * [new branch]              gh/laithsakka/328/head      -> origin/gh/laithsakka/328/head
2025-12-04T09:17:18.5301801Z  * [new branch]              gh/laithsakka/328/orig      -> origin/gh/laithsakka/328/orig
2025-12-04T09:17:18.5304826Z  * [new branch]              gh/liangel/4/base           -> origin/gh/liangel/4/base
2025-12-04T09:17:18.5306712Z  * [new branch]              gh/liangel/4/head           -> origin/gh/liangel/4/head
2025-12-04T09:17:18.5308934Z  * [new branch]              gh/liangel/4/orig           -> origin/gh/liangel/4/orig
2025-12-04T09:17:18.5313960Z  * [new branch]              gh/lucaskabela/1/base       -> origin/gh/lucaskabela/1/base
2025-12-04T09:17:18.5315536Z  * [new branch]              gh/lucaskabela/1/head       -> origin/gh/lucaskabela/1/head
2025-12-04T09:17:18.5318627Z  * [new branch]              gh/lw/4/base                -> origin/gh/lw/4/base
2025-12-04T09:17:18.5320506Z  * [new branch]              gh/lw/4/head                -> origin/gh/lw/4/head
2025-12-04T09:17:18.5322464Z  * [new branch]              gh/lw/4/orig                -> origin/gh/lw/4/orig
2025-12-04T09:17:18.5325162Z  * [new branch]              gh/lw/5/base                -> origin/gh/lw/5/base
2025-12-04T09:17:18.5327012Z  * [new branch]              gh/lw/5/head                -> origin/gh/lw/5/head
2025-12-04T09:17:18.5328874Z  * [new branch]              gh/lw/5/orig                -> origin/gh/lw/5/orig
2025-12-04T09:17:18.5331399Z  * [new branch]              gh/lw/6/base                -> origin/gh/lw/6/base
2025-12-04T09:17:18.5333388Z  * [new branch]              gh/lw/6/head                -> origin/gh/lw/6/head
2025-12-04T09:17:18.5335086Z  * [new branch]              gh/lw/6/orig                -> origin/gh/lw/6/orig
2025-12-04T09:17:18.5338090Z  * [new branch]              gh/malfet/14/base           -> origin/gh/malfet/14/base
2025-12-04T09:17:18.5340793Z  * [new branch]              gh/malfet/417/base          -> origin/gh/malfet/417/base
2025-12-04T09:17:18.5342523Z  * [new branch]              gh/malfet/417/head          -> origin/gh/malfet/417/head
2025-12-04T09:17:18.5344572Z  * [new branch]              gh/malfet/417/orig          -> origin/gh/malfet/417/orig
2025-12-04T09:17:18.5346802Z  * [new branch]              gh/malfet/506/base          -> origin/gh/malfet/506/base
2025-12-04T09:17:18.5349108Z  * [new branch]              gh/malfet/506/head          -> origin/gh/malfet/506/head
2025-12-04T09:17:18.5350644Z  * [new branch]              gh/malfet/506/orig          -> origin/gh/malfet/506/orig
2025-12-04T09:17:18.5353200Z  * [new branch]              gh/malfet/517/base          -> origin/gh/malfet/517/base
2025-12-04T09:17:18.5355025Z  * [new branch]              gh/malfet/517/head          -> origin/gh/malfet/517/head
2025-12-04T09:17:18.5357541Z  * [new branch]              gh/malfet/528/base          -> origin/gh/malfet/528/base
2025-12-04T09:17:18.5359291Z  * [new branch]              gh/malfet/528/head          -> origin/gh/malfet/528/head
2025-12-04T09:17:18.5361080Z  * [new branch]              gh/malfet/528/orig          -> origin/gh/malfet/528/orig
2025-12-04T09:17:18.5363578Z  * [new branch]              gh/malfet/537/base          -> origin/gh/malfet/537/base
2025-12-04T09:17:18.5365552Z  * [new branch]              gh/malfet/537/head          -> origin/gh/malfet/537/head
2025-12-04T09:17:18.5367561Z  * [new branch]              gh/malfet/537/orig          -> origin/gh/malfet/537/orig
2025-12-04T09:17:18.5369893Z  * [new branch]              gh/malfet/546/base          -> origin/gh/malfet/546/base
2025-12-04T09:17:18.5371605Z  * [new branch]              gh/malfet/546/head          -> origin/gh/malfet/546/head
2025-12-04T09:17:18.5373401Z  * [new branch]              gh/malfet/546/orig          -> origin/gh/malfet/546/orig
2025-12-04T09:17:18.5375935Z  * [new branch]              gh/malfet/565/base          -> origin/gh/malfet/565/base
2025-12-04T09:17:18.5377874Z  * [new branch]              gh/malfet/565/head          -> origin/gh/malfet/565/head
2025-12-04T09:17:18.5379891Z  * [new branch]              gh/malfet/565/orig          -> origin/gh/malfet/565/orig
2025-12-04T09:17:18.5382463Z  * [new branch]              gh/malfet/575/base          -> origin/gh/malfet/575/base
2025-12-04T09:17:18.5384258Z  * [new branch]              gh/malfet/575/head          -> origin/gh/malfet/575/head
2025-12-04T09:17:18.5385999Z  * [new branch]              gh/malfet/575/orig          -> origin/gh/malfet/575/orig
2025-12-04T09:17:18.5388570Z  * [new branch]              gh/malfet/580/base          -> origin/gh/malfet/580/base
2025-12-04T09:17:18.5390417Z  * [new branch]              gh/malfet/580/head          -> origin/gh/malfet/580/head
2025-12-04T09:17:18.5392198Z  * [new branch]              gh/malfet/580/orig          -> origin/gh/malfet/580/orig
2025-12-04T09:17:18.5394655Z  * [new branch]              gh/malfet/581/base          -> origin/gh/malfet/581/base
2025-12-04T09:17:18.5396520Z  * [new branch]              gh/malfet/581/head          -> origin/gh/malfet/581/head
2025-12-04T09:17:18.5398415Z  * [new branch]              gh/malfet/581/orig          -> origin/gh/malfet/581/orig
2025-12-04T09:17:18.5400869Z  * [new branch]              gh/malfet/583/base          -> origin/gh/malfet/583/base
2025-12-04T09:17:18.5402691Z  * [new branch]              gh/malfet/583/head          -> origin/gh/malfet/583/head
2025-12-04T09:17:18.5404466Z  * [new branch]              gh/malfet/583/orig          -> origin/gh/malfet/583/orig
2025-12-04T09:17:18.5407397Z  * [new branch]              gh/malfet/586/base          -> origin/gh/malfet/586/base
2025-12-04T09:17:18.5409634Z  * [new branch]              gh/malfet/586/head          -> origin/gh/malfet/586/head
2025-12-04T09:17:18.5411415Z  * [new branch]              gh/malfet/586/orig          -> origin/gh/malfet/586/orig
2025-12-04T09:17:18.5413851Z  * [new branch]              gh/malfet/587/base          -> origin/gh/malfet/587/base
2025-12-04T09:17:18.5415710Z  * [new branch]              gh/malfet/587/head          -> origin/gh/malfet/587/head
2025-12-04T09:17:18.5417588Z  * [new branch]              gh/malfet/587/orig          -> origin/gh/malfet/587/orig
2025-12-04T09:17:18.5420200Z  * [new branch]              gh/malfet/588/base          -> origin/gh/malfet/588/base
2025-12-04T09:17:18.5421975Z  * [new branch]              gh/malfet/588/head          -> origin/gh/malfet/588/head
2025-12-04T09:17:18.5424380Z  * [new branch]              gh/malfet/588/orig          -> origin/gh/malfet/588/orig
2025-12-04T09:17:18.5427137Z  * [new branch]              gh/malfet/589/base          -> origin/gh/malfet/589/base
2025-12-04T09:17:18.5429031Z  * [new branch]              gh/malfet/589/head          -> origin/gh/malfet/589/head
2025-12-04T09:17:18.5431012Z  * [new branch]              gh/malfet/589/orig          -> origin/gh/malfet/589/orig
2025-12-04T09:17:18.5433369Z  * [new branch]              gh/malfet/590/base          -> origin/gh/malfet/590/base
2025-12-04T09:17:18.5435295Z  * [new branch]              gh/malfet/590/head          -> origin/gh/malfet/590/head
2025-12-04T09:17:18.5437583Z  * [new branch]              gh/malfet/590/orig          -> origin/gh/malfet/590/orig
2025-12-04T09:17:18.5440653Z  * [new branch]              gh/malfet/591/base          -> origin/gh/malfet/591/base
2025-12-04T09:17:18.5442485Z  * [new branch]              gh/malfet/591/head          -> origin/gh/malfet/591/head
2025-12-04T09:17:18.5444304Z  * [new branch]              gh/malfet/591/orig          -> origin/gh/malfet/591/orig
2025-12-04T09:17:18.5446985Z  * [new branch]              gh/malfet/592/base          -> origin/gh/malfet/592/base
2025-12-04T09:17:18.5448856Z  * [new branch]              gh/malfet/592/head          -> origin/gh/malfet/592/head
2025-12-04T09:17:18.5450856Z  * [new branch]              gh/malfet/592/orig          -> origin/gh/malfet/592/orig
2025-12-04T09:17:18.5453436Z  * [new branch]              gh/malfet/593/base          -> origin/gh/malfet/593/base
2025-12-04T09:17:18.5455286Z  * [new branch]              gh/malfet/593/head          -> origin/gh/malfet/593/head
2025-12-04T09:17:18.5457213Z  * [new branch]              gh/malfet/593/orig          -> origin/gh/malfet/593/orig
2025-12-04T09:17:18.5460011Z  * [new branch]              gh/malfet/594/base          -> origin/gh/malfet/594/base
2025-12-04T09:17:18.5461742Z  * [new branch]              gh/malfet/594/head          -> origin/gh/malfet/594/head
2025-12-04T09:17:18.5463572Z  * [new branch]              gh/malfet/594/orig          -> origin/gh/malfet/594/orig
2025-12-04T09:17:18.5466299Z  * [new branch]              gh/malfet/595/base          -> origin/gh/malfet/595/base
2025-12-04T09:17:18.5468033Z  * [new branch]              gh/malfet/595/head          -> origin/gh/malfet/595/head
2025-12-04T09:17:18.5469872Z  * [new branch]              gh/malfet/595/orig          -> origin/gh/malfet/595/orig
2025-12-04T09:17:18.5472462Z  * [new branch]              gh/malfet/596/base          -> origin/gh/malfet/596/base
2025-12-04T09:17:18.5474349Z  * [new branch]              gh/malfet/596/head          -> origin/gh/malfet/596/head
2025-12-04T09:17:18.5476346Z  * [new branch]              gh/malfet/596/orig          -> origin/gh/malfet/596/orig
2025-12-04T09:17:18.5478968Z  * [new branch]              gh/malfet/597/base          -> origin/gh/malfet/597/base
2025-12-04T09:17:18.5480784Z  * [new branch]              gh/malfet/597/head          -> origin/gh/malfet/597/head
2025-12-04T09:17:18.5482726Z  * [new branch]              gh/malfet/597/orig          -> origin/gh/malfet/597/orig
2025-12-04T09:17:18.5485230Z  * [new branch]              gh/malfet/598/base          -> origin/gh/malfet/598/base
2025-12-04T09:17:18.5487837Z  * [new branch]              gh/malfet/598/head          -> origin/gh/malfet/598/head
2025-12-04T09:17:18.5489386Z  * [new branch]              gh/malfet/598/orig          -> origin/gh/malfet/598/orig
2025-12-04T09:17:18.5492165Z  * [new branch]              gh/malfet/599/base          -> origin/gh/malfet/599/base
2025-12-04T09:17:18.5493976Z  * [new branch]              gh/malfet/599/head          -> origin/gh/malfet/599/head
2025-12-04T09:17:18.5495876Z  * [new branch]              gh/malfet/599/orig          -> origin/gh/malfet/599/orig
2025-12-04T09:17:18.5498725Z  * [new branch]              gh/malfet/600/base          -> origin/gh/malfet/600/base
2025-12-04T09:17:18.5500741Z  * [new branch]              gh/malfet/600/head          -> origin/gh/malfet/600/head
2025-12-04T09:17:18.5502678Z  * [new branch]              gh/malfet/600/orig          -> origin/gh/malfet/600/orig
2025-12-04T09:17:18.5505217Z  * [new branch]              gh/malfet/601/base          -> origin/gh/malfet/601/base
2025-12-04T09:17:18.5506995Z  * [new branch]              gh/malfet/601/head          -> origin/gh/malfet/601/head
2025-12-04T09:17:18.5509000Z  * [new branch]              gh/malfet/601/orig          -> origin/gh/malfet/601/orig
2025-12-04T09:17:18.5511834Z  * [new branch]              gh/malfet/602/base          -> origin/gh/malfet/602/base
2025-12-04T09:17:18.5513585Z  * [new branch]              gh/malfet/602/head          -> origin/gh/malfet/602/head
2025-12-04T09:17:18.5515672Z  * [new branch]              gh/malfet/602/orig          -> origin/gh/malfet/602/orig
2025-12-04T09:17:18.5518012Z  * [new branch]              gh/malfet/603/base          -> origin/gh/malfet/603/base
2025-12-04T09:17:18.5519599Z  * [new branch]              gh/malfet/603/head          -> origin/gh/malfet/603/head
2025-12-04T09:17:18.5521499Z  * [new branch]              gh/malfet/603/orig          -> origin/gh/malfet/603/orig
2025-12-04T09:17:18.5524051Z  * [new branch]              gh/malfet/604/base          -> origin/gh/malfet/604/base
2025-12-04T09:17:18.5525861Z  * [new branch]              gh/malfet/604/head          -> origin/gh/malfet/604/head
2025-12-04T09:17:18.5527754Z  * [new branch]              gh/malfet/604/orig          -> origin/gh/malfet/604/orig
2025-12-04T09:17:18.5530521Z  * [new branch]              gh/malfet/605/base          -> origin/gh/malfet/605/base
2025-12-04T09:17:18.5532280Z  * [new branch]              gh/malfet/605/head          -> origin/gh/malfet/605/head
2025-12-04T09:17:18.5534054Z  * [new branch]              gh/malfet/605/orig          -> origin/gh/malfet/605/orig
2025-12-04T09:17:18.5536644Z  * [new branch]              gh/malfet/606/base          -> origin/gh/malfet/606/base
2025-12-04T09:17:18.5538567Z  * [new branch]              gh/malfet/606/head          -> origin/gh/malfet/606/head
2025-12-04T09:17:18.5540946Z  * [new branch]              gh/malfet/606/orig          -> origin/gh/malfet/606/orig
2025-12-04T09:17:18.5543608Z  * [new branch]              gh/malfet/607/base          -> origin/gh/malfet/607/base
2025-12-04T09:17:18.5545160Z  * [new branch]              gh/malfet/607/head          -> origin/gh/malfet/607/head
2025-12-04T09:17:18.5547106Z  * [new branch]              gh/malfet/607/orig          -> origin/gh/malfet/607/orig
2025-12-04T09:17:18.5549739Z  * [new branch]              gh/malfet/608/base          -> origin/gh/malfet/608/base
2025-12-04T09:17:18.5551551Z  * [new branch]              gh/malfet/608/head          -> origin/gh/malfet/608/head
2025-12-04T09:17:18.5553488Z  * [new branch]              gh/malfet/608/orig          -> origin/gh/malfet/608/orig
2025-12-04T09:17:18.5556146Z  * [new branch]              gh/malfet/609/base          -> origin/gh/malfet/609/base
2025-12-04T09:17:18.5557971Z  * [new branch]              gh/malfet/609/head          -> origin/gh/malfet/609/head
2025-12-04T09:17:18.5559919Z  * [new branch]              gh/malfet/609/orig          -> origin/gh/malfet/609/orig
2025-12-04T09:17:18.5562625Z  * [new branch]              gh/malfet/610/base          -> origin/gh/malfet/610/base
2025-12-04T09:17:18.5564517Z  * [new branch]              gh/malfet/610/head          -> origin/gh/malfet/610/head
2025-12-04T09:17:18.5566303Z  * [new branch]              gh/malfet/610/orig          -> origin/gh/malfet/610/orig
2025-12-04T09:17:18.5568911Z  * [new branch]              gh/malfet/611/base          -> origin/gh/malfet/611/base
2025-12-04T09:17:18.5570688Z  * [new branch]              gh/malfet/611/head          -> origin/gh/malfet/611/head
2025-12-04T09:17:18.5573239Z  * [new branch]              gh/malfet/611/orig          -> origin/gh/malfet/611/orig
2025-12-04T09:17:18.5575648Z  * [new branch]              gh/malfet/612/base          -> origin/gh/malfet/612/base
2025-12-04T09:17:18.5577538Z  * [new branch]              gh/malfet/612/head          -> origin/gh/malfet/612/head
2025-12-04T09:17:18.5579870Z  * [new branch]              gh/malfet/612/orig          -> origin/gh/malfet/612/orig
2025-12-04T09:17:18.5582408Z  * [new branch]              gh/malfet/64/base           -> origin/gh/malfet/64/base
2025-12-04T09:17:18.5584176Z  * [new branch]              gh/malfet/64/head           -> origin/gh/malfet/64/head
2025-12-04T09:17:18.5587310Z  * [new branch]              gh/manuelcandales/11/base   -> origin/gh/manuelcandales/11/base
2025-12-04T09:17:18.5589109Z  * [new branch]              gh/manuelcandales/11/head   -> origin/gh/manuelcandales/11/head
2025-12-04T09:17:18.5590997Z  * [new branch]              gh/manuelcandales/11/orig   -> origin/gh/manuelcandales/11/orig
2025-12-04T09:17:18.5594314Z  * [new branch]              gh/markkm/1/base            -> origin/gh/markkm/1/base
2025-12-04T09:17:18.5597466Z  * [new branch]              gh/masnesral/1/base         -> origin/gh/masnesral/1/base
2025-12-04T09:17:18.5599294Z  * [new branch]              gh/masnesral/1/head         -> origin/gh/masnesral/1/head
2025-12-04T09:17:18.5601514Z  * [new branch]              gh/masnesral/1/orig         -> origin/gh/masnesral/1/orig
2025-12-04T09:17:18.5604722Z  * [new branch]              gh/mhorowitz/0/base         -> origin/gh/mhorowitz/0/base
2025-12-04T09:17:18.5606361Z  * [new branch]              gh/mhorowitz/0/head         -> origin/gh/mhorowitz/0/head
2025-12-04T09:17:18.5608842Z  * [new branch]              gh/mhorowitz/1/base         -> origin/gh/mhorowitz/1/base
2025-12-04T09:17:18.5613238Z  * [new branch]              gh/mhorowitz/1/head         -> origin/gh/mhorowitz/1/head
2025-12-04T09:17:18.5615705Z  * [new branch]              gh/mhorowitz/2/base         -> origin/gh/mhorowitz/2/base
2025-12-04T09:17:18.5617517Z  * [new branch]              gh/mhorowitz/2/head         -> origin/gh/mhorowitz/2/head
2025-12-04T09:17:18.5620080Z  * [new branch]              gh/mhorowitz/3/base         -> origin/gh/mhorowitz/3/base
2025-12-04T09:17:18.5621961Z  * [new branch]              gh/mhorowitz/3/head         -> origin/gh/mhorowitz/3/head
2025-12-04T09:17:18.5624377Z  * [new branch]              gh/mhorowitz/4/base         -> origin/gh/mhorowitz/4/base
2025-12-04T09:17:18.5626257Z  * [new branch]              gh/mhorowitz/4/head         -> origin/gh/mhorowitz/4/head
2025-12-04T09:17:18.5628618Z  * [new branch]              gh/mhorowitz/5/base         -> origin/gh/mhorowitz/5/base
2025-12-04T09:17:18.5630255Z  * [new branch]              gh/mhorowitz/5/head         -> origin/gh/mhorowitz/5/head
2025-12-04T09:17:18.5632656Z  * [new branch]              gh/mhorowitz/6/base         -> origin/gh/mhorowitz/6/base
2025-12-04T09:17:18.5634363Z  * [new branch]              gh/mhorowitz/6/head         -> origin/gh/mhorowitz/6/head
2025-12-04T09:17:18.5637548Z  * [new branch]              gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base
2025-12-04T09:17:18.5639345Z  * [new branch]              gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head
2025-12-04T09:17:18.5641812Z  * [new branch]              gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base
2025-12-04T09:17:18.5643849Z  * [new branch]              gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head
2025-12-04T09:17:18.5646046Z  * [new branch]              gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base
2025-12-04T09:17:18.5647914Z  * [new branch]              gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head
2025-12-04T09:17:18.5650335Z  * [new branch]              gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base
2025-12-04T09:17:18.5652068Z  * [new branch]              gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head
2025-12-04T09:17:18.5654565Z  * [new branch]              gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base
2025-12-04T09:17:18.5656396Z  * [new branch]              gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head
2025-12-04T09:17:18.5658917Z  * [new branch]              gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base
2025-12-04T09:17:18.5661081Z  * [new branch]              gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head
2025-12-04T09:17:18.5662872Z  * [new branch]              gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig
2025-12-04T09:17:18.5665522Z  * [new branch]              gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base
2025-12-04T09:17:18.5667334Z  * [new branch]              gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head
2025-12-04T09:17:18.5669169Z  * [new branch]              gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig
2025-12-04T09:17:18.5671974Z  * [new branch]              gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base
2025-12-04T09:17:18.5674386Z  * [new branch]              gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head
2025-12-04T09:17:18.5676351Z  * [new branch]              gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig
2025-12-04T09:17:18.5678988Z  * [new branch]              gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base
2025-12-04T09:17:18.5680782Z  * [new branch]              gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head
2025-12-04T09:17:18.5682704Z  * [new branch]              gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig
2025-12-04T09:17:18.5685371Z  * [new branch]              gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base
2025-12-04T09:17:18.5687227Z  * [new branch]              gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head
2025-12-04T09:17:18.5689168Z  * [new branch]              gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig
2025-12-04T09:17:18.5691725Z  * [new branch]              gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base
2025-12-04T09:17:18.5693492Z  * [new branch]              gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head
2025-12-04T09:17:18.5695352Z  * [new branch]              gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig
2025-12-04T09:17:18.5698089Z  * [new branch]              gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base
2025-12-04T09:17:18.5700140Z  * [new branch]              gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head
2025-12-04T09:17:18.5701951Z  * [new branch]              gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig
2025-12-04T09:17:18.5704964Z  * [new branch]              gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base
2025-12-04T09:17:18.5706855Z  * [new branch]              gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head
2025-12-04T09:17:18.5709048Z  * [new branch]              gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig
2025-12-04T09:17:18.5714052Z  * [new branch]              gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base
2025-12-04T09:17:18.5716254Z  * [new branch]              gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head
2025-12-04T09:17:18.5718256Z  * [new branch]              gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig
2025-12-04T09:17:18.5721041Z  * [new branch]              gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base
2025-12-04T09:17:18.5723089Z  * [new branch]              gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head
2025-12-04T09:17:18.5725100Z  * [new branch]              gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig
2025-12-04T09:17:18.5727458Z  * [new branch]              gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base
2025-12-04T09:17:18.5729311Z  * [new branch]              gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head
2025-12-04T09:17:18.5731216Z  * [new branch]              gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig
2025-12-04T09:17:18.5734383Z  * [new branch]              gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base
2025-12-04T09:17:18.5736347Z  * [new branch]              gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head
2025-12-04T09:17:18.5738193Z  * [new branch]              gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig
2025-12-04T09:17:18.5740857Z  * [new branch]              gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base
2025-12-04T09:17:18.5742808Z  * [new branch]              gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head
2025-12-04T09:17:18.5744755Z  * [new branch]              gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig
2025-12-04T09:17:18.5747389Z  * [new branch]              gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base
2025-12-04T09:17:18.5749299Z  * [new branch]              gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head
2025-12-04T09:17:18.5751179Z  * [new branch]              gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig
2025-12-04T09:17:18.5753774Z  * [new branch]              gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base
2025-12-04T09:17:18.5755719Z  * [new branch]              gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head
2025-12-04T09:17:18.5757515Z  * [new branch]              gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig
2025-12-04T09:17:18.5760420Z  * [new branch]              gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base
2025-12-04T09:17:18.5762418Z  * [new branch]              gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head
2025-12-04T09:17:18.5764185Z  * [new branch]              gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig
2025-12-04T09:17:18.5766829Z  * [new branch]              gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base
2025-12-04T09:17:18.5768921Z  * [new branch]              gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head
2025-12-04T09:17:18.5770794Z  * [new branch]              gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig
2025-12-04T09:17:18.5773738Z  * [new branch]              gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base
2025-12-04T09:17:18.5775791Z  * [new branch]              gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head
2025-12-04T09:17:18.5777669Z  * [new branch]              gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig
2025-12-04T09:17:18.5780937Z  * [new branch]              gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base
2025-12-04T09:17:18.5782729Z  * [new branch]              gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head
2025-12-04T09:17:18.5784563Z  * [new branch]              gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig
2025-12-04T09:17:18.5787416Z  * [new branch]              gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base
2025-12-04T09:17:18.5789316Z  * [new branch]              gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head
2025-12-04T09:17:18.5791233Z  * [new branch]              gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig
2025-12-04T09:17:18.5793893Z  * [new branch]              gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base
2025-12-04T09:17:18.5795823Z  * [new branch]              gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head
2025-12-04T09:17:18.5797616Z  * [new branch]              gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig
2025-12-04T09:17:18.5800184Z  * [new branch]              gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base
2025-12-04T09:17:18.5802077Z  * [new branch]              gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head
2025-12-04T09:17:18.5803882Z  * [new branch]              gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig
2025-12-04T09:17:18.5806570Z  * [new branch]              gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base
2025-12-04T09:17:18.5808726Z  * [new branch]              gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head
2025-12-04T09:17:18.5812395Z  * [new branch]              gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig
2025-12-04T09:17:18.5815119Z  * [new branch]              gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base
2025-12-04T09:17:18.5817105Z  * [new branch]              gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head
2025-12-04T09:17:18.5818877Z  * [new branch]              gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig
2025-12-04T09:17:18.5821914Z  * [new branch]              gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base
2025-12-04T09:17:18.5823990Z  * [new branch]              gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head
2025-12-04T09:17:18.5826001Z  * [new branch]              gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig
2025-12-04T09:17:18.5828597Z  * [new branch]              gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base
2025-12-04T09:17:18.5830342Z  * [new branch]              gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head
2025-12-04T09:17:18.5832119Z  * [new branch]              gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig
2025-12-04T09:17:18.5834831Z  * [new branch]              gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base
2025-12-04T09:17:18.5836829Z  * [new branch]              gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head
2025-12-04T09:17:18.5838496Z  * [new branch]              gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig
2025-12-04T09:17:18.5840999Z  * [new branch]              gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base
2025-12-04T09:17:18.5843173Z  * [new branch]              gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head
2025-12-04T09:17:18.5845057Z  * [new branch]              gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig
2025-12-04T09:17:18.5847793Z  * [new branch]              gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base
2025-12-04T09:17:18.5849624Z  * [new branch]              gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head
2025-12-04T09:17:18.5851503Z  * [new branch]              gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig
2025-12-04T09:17:18.5854066Z  * [new branch]              gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base
2025-12-04T09:17:18.5856023Z  * [new branch]              gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head
2025-12-04T09:17:18.5857889Z  * [new branch]              gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig
2025-12-04T09:17:18.5860633Z  * [new branch]              gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base
2025-12-04T09:17:18.5862772Z  * [new branch]              gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head
2025-12-04T09:17:18.5864635Z  * [new branch]              gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig
2025-12-04T09:17:18.5867640Z  * [new branch]              gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base
2025-12-04T09:17:18.5869556Z  * [new branch]              gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head
2025-12-04T09:17:18.5871426Z  * [new branch]              gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig
2025-12-04T09:17:18.5874092Z  * [new branch]              gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base
2025-12-04T09:17:18.5875973Z  * [new branch]              gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head
2025-12-04T09:17:18.5877813Z  * [new branch]              gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig
2025-12-04T09:17:18.5880363Z  * [new branch]              gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base
2025-12-04T09:17:18.5882236Z  * [new branch]              gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head
2025-12-04T09:17:18.5884064Z  * [new branch]              gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig
2025-12-04T09:17:18.5886515Z  * [new branch]              gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base
2025-12-04T09:17:18.5888517Z  * [new branch]              gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head
2025-12-04T09:17:18.5890247Z  * [new branch]              gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig
2025-12-04T09:17:18.5893085Z  * [new branch]              gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base
2025-12-04T09:17:18.5894952Z  * [new branch]              gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head
2025-12-04T09:17:18.5896835Z  * [new branch]              gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig
2025-12-04T09:17:18.5899318Z  * [new branch]              gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base
2025-12-04T09:17:18.5901246Z  * [new branch]              gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head
2025-12-04T09:17:18.5903229Z  * [new branch]              gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig
2025-12-04T09:17:18.5905755Z  * [new branch]              gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base
2025-12-04T09:17:18.5907664Z  * [new branch]              gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head
2025-12-04T09:17:18.5909719Z  * [new branch]              gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig
2025-12-04T09:17:18.5912255Z  * [new branch]              gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base
2025-12-04T09:17:18.5914104Z  * [new branch]              gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head
2025-12-04T09:17:18.5915952Z  * [new branch]              gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig
2025-12-04T09:17:18.5918480Z  * [new branch]              gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base
2025-12-04T09:17:18.5920545Z  * [new branch]              gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head
2025-12-04T09:17:18.5922214Z  * [new branch]              gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig
2025-12-04T09:17:18.5924974Z  * [new branch]              gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base
2025-12-04T09:17:18.5926846Z  * [new branch]              gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head
2025-12-04T09:17:18.5928587Z  * [new branch]              gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig
2025-12-04T09:17:18.5931331Z  * [new branch]              gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base
2025-12-04T09:17:18.5932962Z  * [new branch]              gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head
2025-12-04T09:17:18.5934825Z  * [new branch]              gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig
2025-12-04T09:17:18.5937305Z  * [new branch]              gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base
2025-12-04T09:17:18.5939749Z  * [new branch]              gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head
2025-12-04T09:17:18.5941663Z  * [new branch]              gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig
2025-12-04T09:17:18.5944491Z  * [new branch]              gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base
2025-12-04T09:17:18.5946491Z  * [new branch]              gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head
2025-12-04T09:17:18.5948450Z  * [new branch]              gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig
2025-12-04T09:17:18.5951113Z  * [new branch]              gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base
2025-12-04T09:17:18.5952999Z  * [new branch]              gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head
2025-12-04T09:17:18.5954683Z  * [new branch]              gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig
2025-12-04T09:17:18.5957545Z  * [new branch]              gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base
2025-12-04T09:17:18.5959423Z  * [new branch]              gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head
2025-12-04T09:17:18.5961235Z  * [new branch]              gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig
2025-12-04T09:17:18.5963804Z  * [new branch]              gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base
2025-12-04T09:17:18.5965921Z  * [new branch]              gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head
2025-12-04T09:17:18.5967582Z  * [new branch]              gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig
2025-12-04T09:17:18.5970632Z  * [new branch]              gh/mlazos/41/base           -> origin/gh/mlazos/41/base
2025-12-04T09:17:18.5972447Z  * [new branch]              gh/mlazos/41/head           -> origin/gh/mlazos/41/head
2025-12-04T09:17:18.5974286Z  * [new branch]              gh/mlazos/41/orig           -> origin/gh/mlazos/41/orig
2025-12-04T09:17:18.5976876Z  * [new branch]              gh/mlazos/42/base           -> origin/gh/mlazos/42/base
2025-12-04T09:17:18.5978685Z  * [new branch]              gh/mlazos/42/head           -> origin/gh/mlazos/42/head
2025-12-04T09:17:18.5980618Z  * [new branch]              gh/mlazos/42/orig           -> origin/gh/mlazos/42/orig
2025-12-04T09:17:18.5983008Z  * [new branch]              gh/mlazos/43/base           -> origin/gh/mlazos/43/base
2025-12-04T09:17:18.5984796Z  * [new branch]              gh/mlazos/43/head           -> origin/gh/mlazos/43/head
2025-12-04T09:17:18.5986687Z  * [new branch]              gh/mlazos/43/orig           -> origin/gh/mlazos/43/orig
2025-12-04T09:17:18.5989035Z  * [new branch]              gh/mlazos/44/base           -> origin/gh/mlazos/44/base
2025-12-04T09:17:18.5990836Z  * [new branch]              gh/mlazos/44/head           -> origin/gh/mlazos/44/head
2025-12-04T09:17:18.5992643Z  * [new branch]              gh/mlazos/44/orig           -> origin/gh/mlazos/44/orig
2025-12-04T09:17:18.5995010Z  * [new branch]              gh/mlazos/47/base           -> origin/gh/mlazos/47/base
2025-12-04T09:17:18.5996965Z  * [new branch]              gh/mlazos/47/head           -> origin/gh/mlazos/47/head
2025-12-04T09:17:18.5998758Z  * [new branch]              gh/mlazos/47/orig           -> origin/gh/mlazos/47/orig
2025-12-04T09:17:18.6001257Z  * [new branch]              gh/mlazos/48/base           -> origin/gh/mlazos/48/base
2025-12-04T09:17:18.6003278Z  * [new branch]              gh/mlazos/48/head           -> origin/gh/mlazos/48/head
2025-12-04T09:17:18.6005341Z  * [new branch]              gh/mlazos/48/orig           -> origin/gh/mlazos/48/orig
2025-12-04T09:17:18.6007556Z  * [new branch]              gh/mlazos/49/base           -> origin/gh/mlazos/49/base
2025-12-04T09:17:18.6009559Z  * [new branch]              gh/mlazos/49/head           -> origin/gh/mlazos/49/head
2025-12-04T09:17:18.6011713Z  * [new branch]              gh/mlazos/49/orig           -> origin/gh/mlazos/49/orig
2025-12-04T09:17:18.6013903Z  * [new branch]              gh/mlazos/50/base           -> origin/gh/mlazos/50/base
2025-12-04T09:17:18.6015634Z  * [new branch]              gh/mlazos/50/head           -> origin/gh/mlazos/50/head
2025-12-04T09:17:18.6017496Z  * [new branch]              gh/mlazos/50/orig           -> origin/gh/mlazos/50/orig
2025-12-04T09:17:18.6020537Z  * [new branch]              gh/mlazos/51/base           -> origin/gh/mlazos/51/base
2025-12-04T09:17:18.6022366Z  * [new branch]              gh/mlazos/51/head           -> origin/gh/mlazos/51/head
2025-12-04T09:17:18.6024149Z  * [new branch]              gh/mlazos/51/orig           -> origin/gh/mlazos/51/orig
2025-12-04T09:17:18.6026746Z  * [new branch]              gh/mlazos/52/base           -> origin/gh/mlazos/52/base
2025-12-04T09:17:18.6028577Z  * [new branch]              gh/mlazos/52/head           -> origin/gh/mlazos/52/head
2025-12-04T09:17:18.6030865Z  * [new branch]              gh/mlazos/52/orig           -> origin/gh/mlazos/52/orig
2025-12-04T09:17:18.6033350Z  * [new branch]              gh/mlazos/53/base           -> origin/gh/mlazos/53/base
2025-12-04T09:17:18.6035135Z  * [new branch]              gh/mlazos/53/head           -> origin/gh/mlazos/53/head
2025-12-04T09:17:18.6036935Z  * [new branch]              gh/mlazos/53/orig           -> origin/gh/mlazos/53/orig
2025-12-04T09:17:18.6039324Z  * [new branch]              gh/mlazos/54/base           -> origin/gh/mlazos/54/base
2025-12-04T09:17:18.6041297Z  * [new branch]              gh/mlazos/54/head           -> origin/gh/mlazos/54/head
2025-12-04T09:17:18.6043135Z  * [new branch]              gh/mlazos/54/orig           -> origin/gh/mlazos/54/orig
2025-12-04T09:17:18.6045568Z  * [new branch]              gh/mlazos/55/base           -> origin/gh/mlazos/55/base
2025-12-04T09:17:18.6047383Z  * [new branch]              gh/mlazos/55/head           -> origin/gh/mlazos/55/head
2025-12-04T09:17:18.6049158Z  * [new branch]              gh/mlazos/55/orig           -> origin/gh/mlazos/55/orig
2025-12-04T09:17:18.6051735Z  * [new branch]              gh/mlazos/56/base           -> origin/gh/mlazos/56/base
2025-12-04T09:17:18.6053628Z  * [new branch]              gh/mlazos/56/head           -> origin/gh/mlazos/56/head
2025-12-04T09:17:18.6055486Z  * [new branch]              gh/mlazos/56/orig           -> origin/gh/mlazos/56/orig
2025-12-04T09:17:18.6057940Z  * [new branch]              gh/mlazos/57/base           -> origin/gh/mlazos/57/base
2025-12-04T09:17:18.6059897Z  * [new branch]              gh/mlazos/57/head           -> origin/gh/mlazos/57/head
2025-12-04T09:17:18.6061696Z  * [new branch]              gh/mlazos/57/orig           -> origin/gh/mlazos/57/orig
2025-12-04T09:17:18.6064839Z  * [new branch]              gh/mlazos/58/base           -> origin/gh/mlazos/58/base
2025-12-04T09:17:18.6067185Z  * [new branch]              gh/mlazos/58/head           -> origin/gh/mlazos/58/head
2025-12-04T09:17:18.6069021Z  * [new branch]              gh/mlazos/58/orig           -> origin/gh/mlazos/58/orig
2025-12-04T09:17:18.6071567Z  * [new branch]              gh/mlazos/59/base           -> origin/gh/mlazos/59/base
2025-12-04T09:17:18.6073395Z  * [new branch]              gh/mlazos/59/head           -> origin/gh/mlazos/59/head
2025-12-04T09:17:18.6075204Z  * [new branch]              gh/mlazos/59/orig           -> origin/gh/mlazos/59/orig
2025-12-04T09:17:18.6077837Z  * [new branch]              gh/mlazos/60/base           -> origin/gh/mlazos/60/base
2025-12-04T09:17:18.6079803Z  * [new branch]              gh/mlazos/60/head           -> origin/gh/mlazos/60/head
2025-12-04T09:17:18.6081470Z  * [new branch]              gh/mlazos/60/orig           -> origin/gh/mlazos/60/orig
2025-12-04T09:17:18.6084477Z  * [new branch]              gh/mlazos/61/base           -> origin/gh/mlazos/61/base
2025-12-04T09:17:18.6086331Z  * [new branch]              gh/mlazos/61/head           -> origin/gh/mlazos/61/head
2025-12-04T09:17:18.6088124Z  * [new branch]              gh/mlazos/61/orig           -> origin/gh/mlazos/61/orig
2025-12-04T09:17:18.6090695Z  * [new branch]              gh/mlazos/62/base           -> origin/gh/mlazos/62/base
2025-12-04T09:17:18.6092944Z  * [new branch]              gh/mlazos/62/head           -> origin/gh/mlazos/62/head
2025-12-04T09:17:18.6094780Z  * [new branch]              gh/mlazos/62/orig           -> origin/gh/mlazos/62/orig
2025-12-04T09:17:18.6097341Z  * [new branch]              gh/mlazos/63/base           -> origin/gh/mlazos/63/base
2025-12-04T09:17:18.6100095Z  * [new branch]              gh/mlazos/63/head           -> origin/gh/mlazos/63/head
2025-12-04T09:17:18.6101912Z  * [new branch]              gh/mlazos/63/orig           -> origin/gh/mlazos/63/orig
2025-12-04T09:17:18.6104489Z  * [new branch]              gh/mlazos/64/base           -> origin/gh/mlazos/64/base
2025-12-04T09:17:18.6106368Z  * [new branch]              gh/mlazos/64/head           -> origin/gh/mlazos/64/head
2025-12-04T09:17:18.6108298Z  * [new branch]              gh/mlazos/64/orig           -> origin/gh/mlazos/64/orig
2025-12-04T09:17:18.6110960Z  * [new branch]              gh/mlazos/65/base           -> origin/gh/mlazos/65/base
2025-12-04T09:17:18.6112791Z  * [new branch]              gh/mlazos/65/head           -> origin/gh/mlazos/65/head
2025-12-04T09:17:18.6114573Z  * [new branch]              gh/mlazos/65/orig           -> origin/gh/mlazos/65/orig
2025-12-04T09:17:18.6117190Z  * [new branch]              gh/mlazos/66/base           -> origin/gh/mlazos/66/base
2025-12-04T09:17:18.6118981Z  * [new branch]              gh/mlazos/66/head           -> origin/gh/mlazos/66/head
2025-12-04T09:17:18.6120767Z  * [new branch]              gh/mlazos/66/orig           -> origin/gh/mlazos/66/orig
2025-12-04T09:17:18.6129719Z  * [new branch]              gh/mlazos/67/base           -> origin/gh/mlazos/67/base
2025-12-04T09:17:18.6130048Z  * [new branch]              gh/mlazos/67/head           -> origin/gh/mlazos/67/head
2025-12-04T09:17:18.6130354Z  * [new branch]              gh/mlazos/67/orig           -> origin/gh/mlazos/67/orig
2025-12-04T09:17:18.6130556Z  * [new branch]              gh/mlazos/68/base           -> origin/gh/mlazos/68/base
2025-12-04T09:17:18.6131359Z  * [new branch]              gh/mlazos/68/head           -> origin/gh/mlazos/68/head
2025-12-04T09:17:18.6133356Z  * [new branch]              gh/mlazos/68/orig           -> origin/gh/mlazos/68/orig
2025-12-04T09:17:18.6135868Z  * [new branch]              gh/mlazos/69/base           -> origin/gh/mlazos/69/base
2025-12-04T09:17:18.6137849Z  * [new branch]              gh/mlazos/69/head           -> origin/gh/mlazos/69/head
2025-12-04T09:17:18.6139631Z  * [new branch]              gh/mlazos/69/orig           -> origin/gh/mlazos/69/orig
2025-12-04T09:17:18.6142269Z  * [new branch]              gh/mlazos/70/base           -> origin/gh/mlazos/70/base
2025-12-04T09:17:18.6144085Z  * [new branch]              gh/mlazos/70/head           -> origin/gh/mlazos/70/head
2025-12-04T09:17:18.6145922Z  * [new branch]              gh/mlazos/70/orig           -> origin/gh/mlazos/70/orig
2025-12-04T09:17:18.6148488Z  * [new branch]              gh/mlazos/71/base           -> origin/gh/mlazos/71/base
2025-12-04T09:17:18.6150420Z  * [new branch]              gh/mlazos/71/head           -> origin/gh/mlazos/71/head
2025-12-04T09:17:18.6152038Z  * [new branch]              gh/mlazos/71/orig           -> origin/gh/mlazos/71/orig
2025-12-04T09:17:18.6154684Z  * [new branch]              gh/mlazos/72/base           -> origin/gh/mlazos/72/base
2025-12-04T09:17:18.6156696Z  * [new branch]              gh/mlazos/72/head           -> origin/gh/mlazos/72/head
2025-12-04T09:17:18.6158311Z  * [new branch]              gh/mlazos/72/orig           -> origin/gh/mlazos/72/orig
2025-12-04T09:17:18.6161140Z  * [new branch]              gh/mlazos/73/base           -> origin/gh/mlazos/73/base
2025-12-04T09:17:18.6162888Z  * [new branch]              gh/mlazos/73/head           -> origin/gh/mlazos/73/head
2025-12-04T09:17:18.6164707Z  * [new branch]              gh/mlazos/73/orig           -> origin/gh/mlazos/73/orig
2025-12-04T09:17:18.6167828Z  * [new branch]              gh/mrmiywj/1/base           -> origin/gh/mrmiywj/1/base
2025-12-04T09:17:18.6169763Z  * [new branch]              gh/mrmiywj/1/head           -> origin/gh/mrmiywj/1/head
2025-12-04T09:17:18.6172897Z  * [new branch]              gh/muchulee8/73/base        -> origin/gh/muchulee8/73/base
2025-12-04T09:17:18.6174875Z  * [new branch]              gh/muchulee8/73/head        -> origin/gh/muchulee8/73/head
2025-12-04T09:17:18.6176842Z  * [new branch]              gh/muchulee8/73/orig        -> origin/gh/muchulee8/73/orig
2025-12-04T09:17:18.6180168Z  * [new branch]              gh/naveenthangudu/1/base    -> origin/gh/naveenthangudu/1/base
2025-12-04T09:17:18.6182036Z  * [new branch]              gh/naveenthangudu/1/head    -> origin/gh/naveenthangudu/1/head
2025-12-04T09:17:18.6183986Z  * [new branch]              gh/naveenthangudu/1/orig    -> origin/gh/naveenthangudu/1/orig
2025-12-04T09:17:18.6186473Z  * [new branch]              gh/naveenthangudu/2/base    -> origin/gh/naveenthangudu/2/base
2025-12-04T09:17:18.6188308Z  * [new branch]              gh/naveenthangudu/2/head    -> origin/gh/naveenthangudu/2/head
2025-12-04T09:17:18.6190168Z  * [new branch]              gh/naveenthangudu/2/orig    -> origin/gh/naveenthangudu/2/orig
2025-12-04T09:17:18.6192702Z  * [new branch]              gh/naveenthangudu/3/base    -> origin/gh/naveenthangudu/3/base
2025-12-04T09:17:18.6194464Z  * [new branch]              gh/naveenthangudu/3/head    -> origin/gh/naveenthangudu/3/head
2025-12-04T09:17:18.6196489Z  * [new branch]              gh/naveenthangudu/3/orig    -> origin/gh/naveenthangudu/3/orig
2025-12-04T09:17:18.6198911Z  * [new branch]              gh/naveenthangudu/4/base    -> origin/gh/naveenthangudu/4/base
2025-12-04T09:17:18.6200787Z  * [new branch]              gh/naveenthangudu/4/head    -> origin/gh/naveenthangudu/4/head
2025-12-04T09:17:18.6202824Z  * [new branch]              gh/naveenthangudu/4/orig    -> origin/gh/naveenthangudu/4/orig
2025-12-04T09:17:18.6205302Z  * [new branch]              gh/naveenthangudu/5/base    -> origin/gh/naveenthangudu/5/base
2025-12-04T09:17:18.6207143Z  * [new branch]              gh/naveenthangudu/5/head    -> origin/gh/naveenthangudu/5/head
2025-12-04T09:17:18.6211902Z  * [new branch]              gh/naveenthangudu/5/orig    -> origin/gh/naveenthangudu/5/orig
2025-12-04T09:17:18.6213904Z  * [new branch]              gh/naveenthangudu/6/base    -> origin/gh/naveenthangudu/6/base
2025-12-04T09:17:18.6214160Z  * [new branch]              gh/naveenthangudu/6/head    -> origin/gh/naveenthangudu/6/head
2025-12-04T09:17:18.6214927Z  * [new branch]              gh/naveenthangudu/6/orig    -> origin/gh/naveenthangudu/6/orig
2025-12-04T09:17:18.6217741Z  * [new branch]              gh/naveenthangudu/7/base    -> origin/gh/naveenthangudu/7/base
2025-12-04T09:17:18.6219544Z  * [new branch]              gh/naveenthangudu/7/head    -> origin/gh/naveenthangudu/7/head
2025-12-04T09:17:18.6221370Z  * [new branch]              gh/naveenthangudu/7/orig    -> origin/gh/naveenthangudu/7/orig
2025-12-04T09:17:18.6223699Z  * [new branch]              gh/naveenthangudu/8/base    -> origin/gh/naveenthangudu/8/base
2025-12-04T09:17:18.6225633Z  * [new branch]              gh/naveenthangudu/8/head    -> origin/gh/naveenthangudu/8/head
2025-12-04T09:17:18.6227635Z  * [new branch]              gh/naveenthangudu/8/orig    -> origin/gh/naveenthangudu/8/orig
2025-12-04T09:17:18.6230857Z  * [new branch]              gh/naveenthangudu/9/base    -> origin/gh/naveenthangudu/9/base
2025-12-04T09:17:18.6232406Z  * [new branch]              gh/naveenthangudu/9/head    -> origin/gh/naveenthangudu/9/head
2025-12-04T09:17:18.6234347Z  * [new branch]              gh/naveenthangudu/9/orig    -> origin/gh/naveenthangudu/9/orig
2025-12-04T09:17:18.6237461Z  * [new branch]              gh/nikitaved/1/base         -> origin/gh/nikitaved/1/base
2025-12-04T09:17:18.6239617Z  * [new branch]              gh/nikitaved/1/head         -> origin/gh/nikitaved/1/head
2025-12-04T09:17:18.6241526Z  * [new branch]              gh/nikitaved/1/orig         -> origin/gh/nikitaved/1/orig
2025-12-04T09:17:18.6244090Z  * [new branch]              gh/nikitaved/10/base        -> origin/gh/nikitaved/10/base
2025-12-04T09:17:18.6245877Z  * [new branch]              gh/nikitaved/10/head        -> origin/gh/nikitaved/10/head
2025-12-04T09:17:18.6247769Z  * [new branch]              gh/nikitaved/10/orig        -> origin/gh/nikitaved/10/orig
2025-12-04T09:17:18.6250240Z  * [new branch]              gh/nikitaved/11/base        -> origin/gh/nikitaved/11/base
2025-12-04T09:17:18.6252164Z  * [new branch]              gh/nikitaved/11/head        -> origin/gh/nikitaved/11/head
2025-12-04T09:17:18.6254632Z  * [new branch]              gh/nikitaved/11/orig        -> origin/gh/nikitaved/11/orig
2025-12-04T09:17:18.6257512Z  * [new branch]              gh/nikitaved/12/base        -> origin/gh/nikitaved/12/base
2025-12-04T09:17:18.6259500Z  * [new branch]              gh/nikitaved/12/head        -> origin/gh/nikitaved/12/head
2025-12-04T09:17:18.6261295Z  * [new branch]              gh/nikitaved/12/orig        -> origin/gh/nikitaved/12/orig
2025-12-04T09:17:18.6263837Z  * [new branch]              gh/nikitaved/13/base        -> origin/gh/nikitaved/13/base
2025-12-04T09:17:18.6265780Z  * [new branch]              gh/nikitaved/13/head        -> origin/gh/nikitaved/13/head
2025-12-04T09:17:18.6267624Z  * [new branch]              gh/nikitaved/13/orig        -> origin/gh/nikitaved/13/orig
2025-12-04T09:17:18.6270217Z  * [new branch]              gh/nikitaved/14/base        -> origin/gh/nikitaved/14/base
2025-12-04T09:17:18.6272004Z  * [new branch]              gh/nikitaved/14/head        -> origin/gh/nikitaved/14/head
2025-12-04T09:17:18.6274368Z  * [new branch]              gh/nikitaved/14/orig        -> origin/gh/nikitaved/14/orig
2025-12-04T09:17:18.6276739Z  * [new branch]              gh/nikitaved/15/base        -> origin/gh/nikitaved/15/base
2025-12-04T09:17:18.6278652Z  * [new branch]              gh/nikitaved/15/head        -> origin/gh/nikitaved/15/head
2025-12-04T09:17:18.6280472Z  * [new branch]              gh/nikitaved/15/orig        -> origin/gh/nikitaved/15/orig
2025-12-04T09:17:18.6282938Z  * [new branch]              gh/nikitaved/16/base        -> origin/gh/nikitaved/16/base
2025-12-04T09:17:18.6284811Z  * [new branch]              gh/nikitaved/16/head        -> origin/gh/nikitaved/16/head
2025-12-04T09:17:18.6286547Z  * [new branch]              gh/nikitaved/16/orig        -> origin/gh/nikitaved/16/orig
2025-12-04T09:17:18.6289061Z  * [new branch]              gh/nikitaved/2/base         -> origin/gh/nikitaved/2/base
2025-12-04T09:17:18.6290952Z  * [new branch]              gh/nikitaved/2/head         -> origin/gh/nikitaved/2/head
2025-12-04T09:17:18.6292731Z  * [new branch]              gh/nikitaved/2/orig         -> origin/gh/nikitaved/2/orig
2025-12-04T09:17:18.6295159Z  * [new branch]              gh/nikitaved/4/base         -> origin/gh/nikitaved/4/base
2025-12-04T09:17:18.6297025Z  * [new branch]              gh/nikitaved/4/head         -> origin/gh/nikitaved/4/head
2025-12-04T09:17:18.6298846Z  * [new branch]              gh/nikitaved/4/orig         -> origin/gh/nikitaved/4/orig
2025-12-04T09:17:18.6301547Z  * [new branch]              gh/nikitaved/5/base         -> origin/gh/nikitaved/5/base
2025-12-04T09:17:18.6303466Z  * [new branch]              gh/nikitaved/5/head         -> origin/gh/nikitaved/5/head
2025-12-04T09:17:18.6305457Z  * [new branch]              gh/nikitaved/5/orig         -> origin/gh/nikitaved/5/orig
2025-12-04T09:17:18.6307906Z  * [new branch]              gh/nikitaved/6/base         -> origin/gh/nikitaved/6/base
2025-12-04T09:17:18.6309950Z  * [new branch]              gh/nikitaved/6/head         -> origin/gh/nikitaved/6/head
2025-12-04T09:17:18.6311853Z  * [new branch]              gh/nikitaved/6/orig         -> origin/gh/nikitaved/6/orig
2025-12-04T09:17:18.6314432Z  * [new branch]              gh/nikitaved/8/base         -> origin/gh/nikitaved/8/base
2025-12-04T09:17:18.6316224Z  * [new branch]              gh/nikitaved/8/head         -> origin/gh/nikitaved/8/head
2025-12-04T09:17:18.6318186Z  * [new branch]              gh/nikitaved/8/orig         -> origin/gh/nikitaved/8/orig
2025-12-04T09:17:18.6320584Z  * [new branch]              gh/nikitaved/9/base         -> origin/gh/nikitaved/9/base
2025-12-04T09:17:18.6322374Z  * [new branch]              gh/nikitaved/9/head         -> origin/gh/nikitaved/9/head
2025-12-04T09:17:18.6324212Z  * [new branch]              gh/nikitaved/9/orig         -> origin/gh/nikitaved/9/orig
2025-12-04T09:17:18.6327258Z  * [new branch]              gh/oulgen/10/base           -> origin/gh/oulgen/10/base
2025-12-04T09:17:18.6329082Z  * [new branch]              gh/oulgen/10/head           -> origin/gh/oulgen/10/head
2025-12-04T09:17:18.6330915Z  * [new branch]              gh/oulgen/10/orig           -> origin/gh/oulgen/10/orig
2025-12-04T09:17:18.6333328Z  * [new branch]              gh/oulgen/11/base           -> origin/gh/oulgen/11/base
2025-12-04T09:17:18.6335816Z  * [new branch]              gh/oulgen/11/head           -> origin/gh/oulgen/11/head
2025-12-04T09:17:18.6337622Z  * [new branch]              gh/oulgen/11/orig           -> origin/gh/oulgen/11/orig
2025-12-04T09:17:18.6340177Z  * [new branch]              gh/oulgen/12/base           -> origin/gh/oulgen/12/base
2025-12-04T09:17:18.6341992Z  * [new branch]              gh/oulgen/12/head           -> origin/gh/oulgen/12/head
2025-12-04T09:17:18.6344046Z  * [new branch]              gh/oulgen/12/orig           -> origin/gh/oulgen/12/orig
2025-12-04T09:17:18.6346341Z  * [new branch]              gh/oulgen/13/base           -> origin/gh/oulgen/13/base
2025-12-04T09:17:18.6348152Z  * [new branch]              gh/oulgen/13/head           -> origin/gh/oulgen/13/head
2025-12-04T09:17:18.6349968Z  * [new branch]              gh/oulgen/13/orig           -> origin/gh/oulgen/13/orig
2025-12-04T09:17:18.6352550Z  * [new branch]              gh/oulgen/14/base           -> origin/gh/oulgen/14/base
2025-12-04T09:17:18.6354395Z  * [new branch]              gh/oulgen/14/head           -> origin/gh/oulgen/14/head
2025-12-04T09:17:18.6356597Z  * [new branch]              gh/oulgen/14/orig           -> origin/gh/oulgen/14/orig
2025-12-04T09:17:18.6358817Z  * [new branch]              gh/oulgen/15/base           -> origin/gh/oulgen/15/base
2025-12-04T09:17:18.6360637Z  * [new branch]              gh/oulgen/15/head           -> origin/gh/oulgen/15/head
2025-12-04T09:17:18.6362392Z  * [new branch]              gh/oulgen/15/orig           -> origin/gh/oulgen/15/orig
2025-12-04T09:17:18.6364789Z  * [new branch]              gh/oulgen/16/base           -> origin/gh/oulgen/16/base
2025-12-04T09:17:18.6366636Z  * [new branch]              gh/oulgen/16/head           -> origin/gh/oulgen/16/head
2025-12-04T09:17:18.6368427Z  * [new branch]              gh/oulgen/16/orig           -> origin/gh/oulgen/16/orig
2025-12-04T09:17:18.6370860Z  * [new branch]              gh/oulgen/17/base           -> origin/gh/oulgen/17/base
2025-12-04T09:17:18.6372650Z  * [new branch]              gh/oulgen/17/head           -> origin/gh/oulgen/17/head
2025-12-04T09:17:18.6374842Z  * [new branch]              gh/oulgen/17/orig           -> origin/gh/oulgen/17/orig
2025-12-04T09:17:18.6377103Z  * [new branch]              gh/oulgen/18/base           -> origin/gh/oulgen/18/base
2025-12-04T09:17:18.6378928Z  * [new branch]              gh/oulgen/18/head           -> origin/gh/oulgen/18/head
2025-12-04T09:17:18.6381037Z  * [new branch]              gh/oulgen/18/orig           -> origin/gh/oulgen/18/orig
2025-12-04T09:17:18.6383327Z  * [new branch]              gh/oulgen/19/base           -> origin/gh/oulgen/19/base
2025-12-04T09:17:18.6385141Z  * [new branch]              gh/oulgen/19/head           -> origin/gh/oulgen/19/head
2025-12-04T09:17:18.6387128Z  * [new branch]              gh/oulgen/19/orig           -> origin/gh/oulgen/19/orig
2025-12-04T09:17:18.6390055Z  * [new branch]              gh/oulgen/20/base           -> origin/gh/oulgen/20/base
2025-12-04T09:17:18.6391918Z  * [new branch]              gh/oulgen/20/head           -> origin/gh/oulgen/20/head
2025-12-04T09:17:18.6393923Z  * [new branch]              gh/oulgen/20/orig           -> origin/gh/oulgen/20/orig
2025-12-04T09:17:18.6396205Z  * [new branch]              gh/oulgen/21/base           -> origin/gh/oulgen/21/base
2025-12-04T09:17:18.6397992Z  * [new branch]              gh/oulgen/21/head           -> origin/gh/oulgen/21/head
2025-12-04T09:17:18.6399816Z  * [new branch]              gh/oulgen/21/orig           -> origin/gh/oulgen/21/orig
2025-12-04T09:17:18.6402366Z  * [new branch]              gh/oulgen/22/base           -> origin/gh/oulgen/22/base
2025-12-04T09:17:18.6404763Z  * [new branch]              gh/oulgen/22/head           -> origin/gh/oulgen/22/head
2025-12-04T09:17:18.6406433Z  * [new branch]              gh/oulgen/22/orig           -> origin/gh/oulgen/22/orig
2025-12-04T09:17:18.6410033Z  * [new branch]              gh/oulgen/23/base           -> origin/gh/oulgen/23/base
2025-12-04T09:17:18.6411925Z  * [new branch]              gh/oulgen/23/head           -> origin/gh/oulgen/23/head
2025-12-04T09:17:18.6413699Z  * [new branch]              gh/oulgen/23/orig           -> origin/gh/oulgen/23/orig
2025-12-04T09:17:18.6416209Z  * [new branch]              gh/oulgen/24/base           -> origin/gh/oulgen/24/base
2025-12-04T09:17:18.6418127Z  * [new branch]              gh/oulgen/24/head           -> origin/gh/oulgen/24/head
2025-12-04T09:17:18.6420041Z  * [new branch]              gh/oulgen/24/orig           -> origin/gh/oulgen/24/orig
2025-12-04T09:17:18.6422585Z  * [new branch]              gh/oulgen/25/base           -> origin/gh/oulgen/25/base
2025-12-04T09:17:18.6424375Z  * [new branch]              gh/oulgen/25/head           -> origin/gh/oulgen/25/head
2025-12-04T09:17:18.6426466Z  * [new branch]              gh/oulgen/25/orig           -> origin/gh/oulgen/25/orig
2025-12-04T09:17:18.6428993Z  * [new branch]              gh/oulgen/26/base           -> origin/gh/oulgen/26/base
2025-12-04T09:17:18.6430589Z  * [new branch]              gh/oulgen/26/head           -> origin/gh/oulgen/26/head
2025-12-04T09:17:18.6432417Z  * [new branch]              gh/oulgen/26/orig           -> origin/gh/oulgen/26/orig
2025-12-04T09:17:18.6434870Z  * [new branch]              gh/oulgen/4/base            -> origin/gh/oulgen/4/base
2025-12-04T09:17:18.6436695Z  * [new branch]              gh/oulgen/4/head            -> origin/gh/oulgen/4/head
2025-12-04T09:17:18.6438468Z  * [new branch]              gh/oulgen/4/orig            -> origin/gh/oulgen/4/orig
2025-12-04T09:17:18.6441554Z  * [new branch]              gh/oulgen/7/base            -> origin/gh/oulgen/7/base
2025-12-04T09:17:18.6443337Z  * [new branch]              gh/oulgen/7/head            -> origin/gh/oulgen/7/head
2025-12-04T09:17:18.6445129Z  * [new branch]              gh/oulgen/7/orig            -> origin/gh/oulgen/7/orig
2025-12-04T09:17:18.6447797Z  * [new branch]              gh/oulgen/8/base            -> origin/gh/oulgen/8/base
2025-12-04T09:17:18.6449641Z  * [new branch]              gh/oulgen/8/head            -> origin/gh/oulgen/8/head
2025-12-04T09:17:18.6451608Z  * [new branch]              gh/oulgen/8/orig            -> origin/gh/oulgen/8/orig
2025-12-04T09:17:18.6454125Z  * [new branch]              gh/oulgen/9/base            -> origin/gh/oulgen/9/base
2025-12-04T09:17:18.6455941Z  * [new branch]              gh/oulgen/9/head            -> origin/gh/oulgen/9/head
2025-12-04T09:17:18.6457980Z  * [new branch]              gh/oulgen/9/orig            -> origin/gh/oulgen/9/orig
2025-12-04T09:17:18.6460549Z  * [new branch]              gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization
2025-12-04T09:17:18.6463934Z  * [new branch]              gh/pearu/108/base           -> origin/gh/pearu/108/base
2025-12-04T09:17:18.6465940Z  * [new branch]              gh/pearu/108/head           -> origin/gh/pearu/108/head
2025-12-04T09:17:18.6467778Z  * [new branch]              gh/pearu/108/orig           -> origin/gh/pearu/108/orig
2025-12-04T09:17:18.6470248Z  * [new branch]              gh/pearu/109/base           -> origin/gh/pearu/109/base
2025-12-04T09:17:18.6472038Z  * [new branch]              gh/pearu/109/head           -> origin/gh/pearu/109/head
2025-12-04T09:17:18.6473905Z  * [new branch]              gh/pearu/109/orig           -> origin/gh/pearu/109/orig
2025-12-04T09:17:18.6476825Z  * [new branch]              gh/pearu/110/base           -> origin/gh/pearu/110/base
2025-12-04T09:17:18.6478373Z  * [new branch]              gh/pearu/110/head           -> origin/gh/pearu/110/head
2025-12-04T09:17:18.6480330Z  * [new branch]              gh/pearu/110/orig           -> origin/gh/pearu/110/orig
2025-12-04T09:17:18.6482819Z  * [new branch]              gh/pearu/111/base           -> origin/gh/pearu/111/base
2025-12-04T09:17:18.6484506Z  * [new branch]              gh/pearu/111/head           -> origin/gh/pearu/111/head
2025-12-04T09:17:18.6486470Z  * [new branch]              gh/pearu/111/orig           -> origin/gh/pearu/111/orig
2025-12-04T09:17:18.6489049Z  * [new branch]              gh/pearu/112/base           -> origin/gh/pearu/112/base
2025-12-04T09:17:18.6491141Z  * [new branch]              gh/pearu/112/head           -> origin/gh/pearu/112/head
2025-12-04T09:17:18.6492683Z  * [new branch]              gh/pearu/112/orig           -> origin/gh/pearu/112/orig
2025-12-04T09:17:18.6495187Z  * [new branch]              gh/pearu/115/base           -> origin/gh/pearu/115/base
2025-12-04T09:17:18.6497038Z  * [new branch]              gh/pearu/115/head           -> origin/gh/pearu/115/head
2025-12-04T09:17:18.6498860Z  * [new branch]              gh/pearu/115/orig           -> origin/gh/pearu/115/orig
2025-12-04T09:17:18.6501665Z  * [new branch]              gh/pearu/116/base           -> origin/gh/pearu/116/base
2025-12-04T09:17:18.6503409Z  * [new branch]              gh/pearu/116/head           -> origin/gh/pearu/116/head
2025-12-04T09:17:18.6505320Z  * [new branch]              gh/pearu/116/orig           -> origin/gh/pearu/116/orig
2025-12-04T09:17:18.6507863Z  * [new branch]              gh/pearu/117/base           -> origin/gh/pearu/117/base
2025-12-04T09:17:18.6511960Z  * [new branch]              gh/pearu/117/head           -> origin/gh/pearu/117/head
2025-12-04T09:17:18.6513923Z  * [new branch]              gh/pearu/117/orig           -> origin/gh/pearu/117/orig
2025-12-04T09:17:18.6516478Z  * [new branch]              gh/pearu/118/base           -> origin/gh/pearu/118/base
2025-12-04T09:17:18.6518256Z  * [new branch]              gh/pearu/118/head           -> origin/gh/pearu/118/head
2025-12-04T09:17:18.6520084Z  * [new branch]              gh/pearu/118/orig           -> origin/gh/pearu/118/orig
2025-12-04T09:17:18.6522603Z  * [new branch]              gh/pearu/119/base           -> origin/gh/pearu/119/base
2025-12-04T09:17:18.6524787Z  * [new branch]              gh/pearu/119/head           -> origin/gh/pearu/119/head
2025-12-04T09:17:18.6526643Z  * [new branch]              gh/pearu/119/orig           -> origin/gh/pearu/119/orig
2025-12-04T09:17:18.6529246Z  * [new branch]              gh/pearu/139/base           -> origin/gh/pearu/139/base
2025-12-04T09:17:18.6531040Z  * [new branch]              gh/pearu/139/head           -> origin/gh/pearu/139/head
2025-12-04T09:17:18.6532864Z  * [new branch]              gh/pearu/139/orig           -> origin/gh/pearu/139/orig
2025-12-04T09:17:18.6535374Z  * [new branch]              gh/pearu/140/base           -> origin/gh/pearu/140/base
2025-12-04T09:17:18.6537368Z  * [new branch]              gh/pearu/140/head           -> origin/gh/pearu/140/head
2025-12-04T09:17:18.6539166Z  * [new branch]              gh/pearu/140/orig           -> origin/gh/pearu/140/orig
2025-12-04T09:17:18.6541744Z  * [new branch]              gh/pearu/142/base           -> origin/gh/pearu/142/base
2025-12-04T09:17:18.6543585Z  * [new branch]              gh/pearu/142/head           -> origin/gh/pearu/142/head
2025-12-04T09:17:18.6545429Z  * [new branch]              gh/pearu/142/orig           -> origin/gh/pearu/142/orig
2025-12-04T09:17:18.6547924Z  * [new branch]              gh/pearu/143/base           -> origin/gh/pearu/143/base
2025-12-04T09:17:18.6549722Z  * [new branch]              gh/pearu/143/head           -> origin/gh/pearu/143/head
2025-12-04T09:17:18.6551614Z  * [new branch]              gh/pearu/143/orig           -> origin/gh/pearu/143/orig
2025-12-04T09:17:18.6554237Z  * [new branch]              gh/pearu/147/base           -> origin/gh/pearu/147/base
2025-12-04T09:17:18.6556073Z  * [new branch]              gh/pearu/147/head           -> origin/gh/pearu/147/head
2025-12-04T09:17:18.6557923Z  * [new branch]              gh/pearu/147/orig           -> origin/gh/pearu/147/orig
2025-12-04T09:17:18.6560441Z  * [new branch]              gh/pearu/149/base           -> origin/gh/pearu/149/base
2025-12-04T09:17:18.6562245Z  * [new branch]              gh/pearu/149/head           -> origin/gh/pearu/149/head
2025-12-04T09:17:18.6564259Z  * [new branch]              gh/pearu/149/orig           -> origin/gh/pearu/149/orig
2025-12-04T09:17:18.6567267Z  * [new branch]              gh/pearu/150/base           -> origin/gh/pearu/150/base
2025-12-04T09:17:18.6569151Z  * [new branch]              gh/pearu/150/head           -> origin/gh/pearu/150/head
2025-12-04T09:17:18.6570913Z  * [new branch]              gh/pearu/150/orig           -> origin/gh/pearu/150/orig
2025-12-04T09:17:18.6574268Z  * [new branch]              gh/pearu/151/base           -> origin/gh/pearu/151/base
2025-12-04T09:17:18.6576677Z  * [new branch]              gh/pearu/151/head           -> origin/gh/pearu/151/head
2025-12-04T09:17:18.6578405Z  * [new branch]              gh/pearu/151/orig           -> origin/gh/pearu/151/orig
2025-12-04T09:17:18.6581302Z  * [new branch]              gh/pearu/152/base           -> origin/gh/pearu/152/base
2025-12-04T09:17:18.6583138Z  * [new branch]              gh/pearu/152/head           -> origin/gh/pearu/152/head
2025-12-04T09:17:18.6585090Z  * [new branch]              gh/pearu/152/orig           -> origin/gh/pearu/152/orig
2025-12-04T09:17:18.6587579Z  * [new branch]              gh/pearu/153/base           -> origin/gh/pearu/153/base
2025-12-04T09:17:18.6589385Z  * [new branch]              gh/pearu/153/head           -> origin/gh/pearu/153/head
2025-12-04T09:17:18.6591185Z  * [new branch]              gh/pearu/153/orig           -> origin/gh/pearu/153/orig
2025-12-04T09:17:18.6593719Z  * [new branch]              gh/pearu/154/base           -> origin/gh/pearu/154/base
2025-12-04T09:17:18.6595534Z  * [new branch]              gh/pearu/154/head           -> origin/gh/pearu/154/head
2025-12-04T09:17:18.6597344Z  * [new branch]              gh/pearu/154/orig           -> origin/gh/pearu/154/orig
2025-12-04T09:17:18.6600011Z  * [new branch]              gh/pearu/155/base           -> origin/gh/pearu/155/base
2025-12-04T09:17:18.6601861Z  * [new branch]              gh/pearu/155/head           -> origin/gh/pearu/155/head
2025-12-04T09:17:18.6603626Z  * [new branch]              gh/pearu/155/orig           -> origin/gh/pearu/155/orig
2025-12-04T09:17:18.6606276Z  * [new branch]              gh/pearu/156/base           -> origin/gh/pearu/156/base
2025-12-04T09:17:18.6608214Z  * [new branch]              gh/pearu/156/head           -> origin/gh/pearu/156/head
2025-12-04T09:17:18.6610165Z  * [new branch]              gh/pearu/156/orig           -> origin/gh/pearu/156/orig
2025-12-04T09:17:18.6613057Z  * [new branch]              gh/pearu/56/base            -> origin/gh/pearu/56/base
2025-12-04T09:17:18.6615758Z  * [new branch]              gh/pearu/56/head            -> origin/gh/pearu/56/head
2025-12-04T09:17:18.6617447Z  * [new branch]              gh/pearu/56/orig            -> origin/gh/pearu/56/orig
2025-12-04T09:17:18.6620540Z  * [new branch]              gh/pearu/97/base            -> origin/gh/pearu/97/base
2025-12-04T09:17:18.6622531Z  * [new branch]              gh/pearu/97/head            -> origin/gh/pearu/97/head
2025-12-04T09:17:18.6624281Z  * [new branch]              gh/pearu/97/orig            -> origin/gh/pearu/97/orig
2025-12-04T09:17:18.6627295Z  * [new branch]              gh/pianpwk/21/base          -> origin/gh/pianpwk/21/base
2025-12-04T09:17:18.6629097Z  * [new branch]              gh/pianpwk/21/head          -> origin/gh/pianpwk/21/head
2025-12-04T09:17:18.6631762Z  * [new branch]              gh/pianpwk/28/base          -> origin/gh/pianpwk/28/base
2025-12-04T09:17:18.6633568Z  * [new branch]              gh/pianpwk/28/head          -> origin/gh/pianpwk/28/head
2025-12-04T09:17:18.6635450Z  * [new branch]              gh/pianpwk/28/orig          -> origin/gh/pianpwk/28/orig
2025-12-04T09:17:18.6637946Z  * [new branch]              gh/pianpwk/29/base          -> origin/gh/pianpwk/29/base
2025-12-04T09:17:18.6639822Z  * [new branch]              gh/pianpwk/29/head          -> origin/gh/pianpwk/29/head
2025-12-04T09:17:18.6641669Z  * [new branch]              gh/pianpwk/29/orig          -> origin/gh/pianpwk/29/orig
2025-12-04T09:17:18.6644376Z  * [new branch]              gh/pianpwk/30/base          -> origin/gh/pianpwk/30/base
2025-12-04T09:17:18.6646232Z  * [new branch]              gh/pianpwk/30/head          -> origin/gh/pianpwk/30/head
2025-12-04T09:17:18.6648087Z  * [new branch]              gh/pianpwk/30/orig          -> origin/gh/pianpwk/30/orig
2025-12-04T09:17:18.6650634Z  * [new branch]              gh/pianpwk/31/base          -> origin/gh/pianpwk/31/base
2025-12-04T09:17:18.6652470Z  * [new branch]              gh/pianpwk/31/head          -> origin/gh/pianpwk/31/head
2025-12-04T09:17:18.6654279Z  * [new branch]              gh/pianpwk/31/orig          -> origin/gh/pianpwk/31/orig
2025-12-04T09:17:18.6656731Z  * [new branch]              gh/pianpwk/32/base          -> origin/gh/pianpwk/32/base
2025-12-04T09:17:18.6658560Z  * [new branch]              gh/pianpwk/32/head          -> origin/gh/pianpwk/32/head
2025-12-04T09:17:18.6660540Z  * [new branch]              gh/pianpwk/32/orig          -> origin/gh/pianpwk/32/orig
2025-12-04T09:17:18.6662872Z  * [new branch]              gh/pianpwk/33/base          -> origin/gh/pianpwk/33/base
2025-12-04T09:17:18.6664685Z  * [new branch]              gh/pianpwk/33/head          -> origin/gh/pianpwk/33/head
2025-12-04T09:17:18.6666464Z  * [new branch]              gh/pianpwk/33/orig          -> origin/gh/pianpwk/33/orig
2025-12-04T09:17:18.6669271Z  * [new branch]              gh/pianpwk/34/base          -> origin/gh/pianpwk/34/base
2025-12-04T09:17:18.6671377Z  * [new branch]              gh/pianpwk/34/head          -> origin/gh/pianpwk/34/head
2025-12-04T09:17:18.6673461Z  * [new branch]              gh/pianpwk/34/orig          -> origin/gh/pianpwk/34/orig
2025-12-04T09:17:18.6675939Z  * [new branch]              gh/pianpwk/35/base          -> origin/gh/pianpwk/35/base
2025-12-04T09:17:18.6677926Z  * [new branch]              gh/pianpwk/35/head          -> origin/gh/pianpwk/35/head
2025-12-04T09:17:18.6679744Z  * [new branch]              gh/pianpwk/35/orig          -> origin/gh/pianpwk/35/orig
2025-12-04T09:17:18.6682818Z  * [new branch]              gh/rec/141/base             -> origin/gh/rec/141/base
2025-12-04T09:17:18.6684694Z  * [new branch]              gh/rec/141/head             -> origin/gh/rec/141/head
2025-12-04T09:17:18.6687191Z  * [new branch]              gh/rec/153/base             -> origin/gh/rec/153/base
2025-12-04T09:17:18.6688979Z  * [new branch]              gh/rec/153/head             -> origin/gh/rec/153/head
2025-12-04T09:17:18.6690733Z  * [new branch]              gh/rec/153/orig             -> origin/gh/rec/153/orig
2025-12-04T09:17:18.6693804Z  * [new branch]              gh/rec/154/base             -> origin/gh/rec/154/base
2025-12-04T09:17:18.6695536Z  * [new branch]              gh/rec/154/head             -> origin/gh/rec/154/head
2025-12-04T09:17:18.6697329Z  * [new branch]              gh/rec/154/orig             -> origin/gh/rec/154/orig
2025-12-04T09:17:18.6699976Z  * [new branch]              gh/rec/164/base             -> origin/gh/rec/164/base
2025-12-04T09:17:18.6701805Z  * [new branch]              gh/rec/164/head             -> origin/gh/rec/164/head
2025-12-04T09:17:18.6703689Z  * [new branch]              gh/rec/164/orig             -> origin/gh/rec/164/orig
2025-12-04T09:17:18.6706203Z  * [new branch]              gh/rec/166/base             -> origin/gh/rec/166/base
2025-12-04T09:17:18.6708340Z  * [new branch]              gh/rec/166/head             -> origin/gh/rec/166/head
2025-12-04T09:17:18.6710078Z  * [new branch]              gh/rec/166/orig             -> origin/gh/rec/166/orig
2025-12-04T09:17:18.6712610Z  * [new branch]              gh/rec/167/base             -> origin/gh/rec/167/base
2025-12-04T09:17:18.6714340Z  * [new branch]              gh/rec/167/head             -> origin/gh/rec/167/head
2025-12-04T09:17:18.6716230Z  * [new branch]              gh/rec/167/orig             -> origin/gh/rec/167/orig
2025-12-04T09:17:18.6718700Z  * [new branch]              gh/rec/168/base             -> origin/gh/rec/168/base
2025-12-04T09:17:18.6720540Z  * [new branch]              gh/rec/168/head             -> origin/gh/rec/168/head
2025-12-04T09:17:18.6722271Z  * [new branch]              gh/rec/168/orig             -> origin/gh/rec/168/orig
2025-12-04T09:17:18.6724844Z  * [new branch]              gh/rec/169/base             -> origin/gh/rec/169/base
2025-12-04T09:17:18.6726780Z  * [new branch]              gh/rec/169/head             -> origin/gh/rec/169/head
2025-12-04T09:17:18.6728554Z  * [new branch]              gh/rec/169/orig             -> origin/gh/rec/169/orig
2025-12-04T09:17:18.6731179Z  * [new branch]              gh/rec/170/base             -> origin/gh/rec/170/base
2025-12-04T09:17:18.6732972Z  * [new branch]              gh/rec/170/head             -> origin/gh/rec/170/head
2025-12-04T09:17:18.6734827Z  * [new branch]              gh/rec/170/orig             -> origin/gh/rec/170/orig
2025-12-04T09:17:18.6737345Z  * [new branch]              gh/rec/171/base             -> origin/gh/rec/171/base
2025-12-04T09:17:18.6739231Z  * [new branch]              gh/rec/171/head             -> origin/gh/rec/171/head
2025-12-04T09:17:18.6741193Z  * [new branch]              gh/rec/171/orig             -> origin/gh/rec/171/orig
2025-12-04T09:17:18.6744164Z  * [new branch]              gh/rec/172/base             -> origin/gh/rec/172/base
2025-12-04T09:17:18.6746055Z  * [new branch]              gh/rec/172/head             -> origin/gh/rec/172/head
2025-12-04T09:17:18.6747788Z  * [new branch]              gh/rec/172/orig             -> origin/gh/rec/172/orig
2025-12-04T09:17:18.6750307Z  * [new branch]              gh/rec/173/base             -> origin/gh/rec/173/base
2025-12-04T09:17:18.6752097Z  * [new branch]              gh/rec/173/head             -> origin/gh/rec/173/head
2025-12-04T09:17:18.6753936Z  * [new branch]              gh/rec/173/orig             -> origin/gh/rec/173/orig
2025-12-04T09:17:18.6756522Z  * [new branch]              gh/rec/174/base             -> origin/gh/rec/174/base
2025-12-04T09:17:18.6758341Z  * [new branch]              gh/rec/174/head             -> origin/gh/rec/174/head
2025-12-04T09:17:18.6760190Z  * [new branch]              gh/rec/174/orig             -> origin/gh/rec/174/orig
2025-12-04T09:17:18.6762664Z  * [new branch]              gh/rec/175/base             -> origin/gh/rec/175/base
2025-12-04T09:17:18.6764516Z  * [new branch]              gh/rec/175/head             -> origin/gh/rec/175/head
2025-12-04T09:17:18.6766320Z  * [new branch]              gh/rec/175/orig             -> origin/gh/rec/175/orig
2025-12-04T09:17:18.6768989Z  * [new branch]              gh/rec/176/base             -> origin/gh/rec/176/base
2025-12-04T09:17:18.6770585Z  * [new branch]              gh/rec/176/head             -> origin/gh/rec/176/head
2025-12-04T09:17:18.6772375Z  * [new branch]              gh/rec/176/orig             -> origin/gh/rec/176/orig
2025-12-04T09:17:18.6774903Z  * [new branch]              gh/rec/177/base             -> origin/gh/rec/177/base
2025-12-04T09:17:18.6776738Z  * [new branch]              gh/rec/177/head             -> origin/gh/rec/177/head
2025-12-04T09:17:18.6778552Z  * [new branch]              gh/rec/177/orig             -> origin/gh/rec/177/orig
2025-12-04T09:17:18.6781933Z  * [new branch]              gh/robert-hardwick/3/base   -> origin/gh/robert-hardwick/3/base
2025-12-04T09:17:18.6783779Z  * [new branch]              gh/robert-hardwick/3/head   -> origin/gh/robert-hardwick/3/head
2025-12-04T09:17:18.6785670Z  * [new branch]              gh/robert-hardwick/3/orig   -> origin/gh/robert-hardwick/3/orig
2025-12-04T09:17:18.6788137Z  * [new branch]              gh/robert-hardwick/4/base   -> origin/gh/robert-hardwick/4/base
2025-12-04T09:17:18.6789987Z  * [new branch]              gh/robert-hardwick/4/head   -> origin/gh/robert-hardwick/4/head
2025-12-04T09:17:18.6791809Z  * [new branch]              gh/robert-hardwick/4/orig   -> origin/gh/robert-hardwick/4/orig
2025-12-04T09:17:18.6794266Z  * [new branch]              gh/robert-hardwick/5/base   -> origin/gh/robert-hardwick/5/base
2025-12-04T09:17:18.6796108Z  * [new branch]              gh/robert-hardwick/5/head   -> origin/gh/robert-hardwick/5/head
2025-12-04T09:17:18.6798030Z  * [new branch]              gh/robert-hardwick/5/orig   -> origin/gh/robert-hardwick/5/orig
2025-12-04T09:17:18.6800538Z  * [new branch]              gh/robert-hardwick/6/base   -> origin/gh/robert-hardwick/6/base
2025-12-04T09:17:18.6802918Z  * [new branch]              gh/robert-hardwick/6/head   -> origin/gh/robert-hardwick/6/head
2025-12-04T09:17:18.6804760Z  * [new branch]              gh/robert-hardwick/6/orig   -> origin/gh/robert-hardwick/6/orig
2025-12-04T09:17:18.6807272Z  * [new branch]              gh/robert-hardwick/7/base   -> origin/gh/robert-hardwick/7/base
2025-12-04T09:17:18.6809230Z  * [new branch]              gh/robert-hardwick/7/head   -> origin/gh/robert-hardwick/7/head
2025-12-04T09:17:18.6811131Z  * [new branch]              gh/robert-hardwick/7/orig   -> origin/gh/robert-hardwick/7/orig
2025-12-04T09:17:18.6813528Z  * [new branch]              gh/robert-hardwick/8/base   -> origin/gh/robert-hardwick/8/base
2025-12-04T09:17:18.6815593Z  * [new branch]              gh/robert-hardwick/8/head   -> origin/gh/robert-hardwick/8/head
2025-12-04T09:17:18.6817416Z  * [new branch]              gh/robert-hardwick/8/orig   -> origin/gh/robert-hardwick/8/orig
2025-12-04T09:17:18.6820103Z  * [new branch]              gh/robert-hardwick/9/base   -> origin/gh/robert-hardwick/9/base
2025-12-04T09:17:18.6822040Z  * [new branch]              gh/robert-hardwick/9/head   -> origin/gh/robert-hardwick/9/head
2025-12-04T09:17:18.6823733Z  * [new branch]              gh/robert-hardwick/9/orig   -> origin/gh/robert-hardwick/9/orig
2025-12-04T09:17:18.6826825Z  * [new branch]              gh/rtimpe/1/base            -> origin/gh/rtimpe/1/base
2025-12-04T09:17:18.6828811Z  * [new branch]              gh/rtimpe/1/head            -> origin/gh/rtimpe/1/head
2025-12-04T09:17:18.6831274Z  * [new branch]              gh/rtimpe/2/base            -> origin/gh/rtimpe/2/base
2025-12-04T09:17:18.6833048Z  * [new branch]              gh/rtimpe/2/head            -> origin/gh/rtimpe/2/head
2025-12-04T09:17:18.6836044Z  * [new branch]              gh/rtimpe/22/base           -> origin/gh/rtimpe/22/base
2025-12-04T09:17:18.6837824Z  * [new branch]              gh/rtimpe/22/head           -> origin/gh/rtimpe/22/head
2025-12-04T09:17:18.6839592Z  * [new branch]              gh/rtimpe/22/orig           -> origin/gh/rtimpe/22/orig
2025-12-04T09:17:18.6842000Z  * [new branch]              gh/rtimpe/23/base           -> origin/gh/rtimpe/23/base
2025-12-04T09:17:18.6844040Z  * [new branch]              gh/rtimpe/23/head           -> origin/gh/rtimpe/23/head
2025-12-04T09:17:18.6845688Z  * [new branch]              gh/rtimpe/23/orig           -> origin/gh/rtimpe/23/orig
2025-12-04T09:17:18.6848109Z  * [new branch]              gh/rtimpe/24/base           -> origin/gh/rtimpe/24/base
2025-12-04T09:17:18.6849921Z  * [new branch]              gh/rtimpe/24/head           -> origin/gh/rtimpe/24/head
2025-12-04T09:17:18.6851972Z  * [new branch]              gh/rtimpe/24/orig           -> origin/gh/rtimpe/24/orig
2025-12-04T09:17:18.6854521Z  * [new branch]              gh/rtimpe/25/base           -> origin/gh/rtimpe/25/base
2025-12-04T09:17:18.6856423Z  * [new branch]              gh/rtimpe/25/head           -> origin/gh/rtimpe/25/head
2025-12-04T09:17:18.6858255Z  * [new branch]              gh/rtimpe/25/orig           -> origin/gh/rtimpe/25/orig
2025-12-04T09:17:18.6860908Z  * [new branch]              gh/rtimpe/26/base           -> origin/gh/rtimpe/26/base
2025-12-04T09:17:18.6862698Z  * [new branch]              gh/rtimpe/26/head           -> origin/gh/rtimpe/26/head
2025-12-04T09:17:18.6864475Z  * [new branch]              gh/rtimpe/26/orig           -> origin/gh/rtimpe/26/orig
2025-12-04T09:17:18.6867495Z  * [new branch]              gh/rtimpe/27/base           -> origin/gh/rtimpe/27/base
2025-12-04T09:17:18.6869316Z  * [new branch]              gh/rtimpe/27/head           -> origin/gh/rtimpe/27/head
2025-12-04T09:17:18.6871123Z  * [new branch]              gh/rtimpe/27/orig           -> origin/gh/rtimpe/27/orig
2025-12-04T09:17:18.6873610Z  * [new branch]              gh/rtimpe/28/base           -> origin/gh/rtimpe/28/base
2025-12-04T09:17:18.6875419Z  * [new branch]              gh/rtimpe/28/head           -> origin/gh/rtimpe/28/head
2025-12-04T09:17:18.6877279Z  * [new branch]              gh/rtimpe/28/orig           -> origin/gh/rtimpe/28/orig
2025-12-04T09:17:18.6880432Z  * [new branch]              gh/rtimpe/29/base           -> origin/gh/rtimpe/29/base
2025-12-04T09:17:18.6882264Z  * [new branch]              gh/rtimpe/29/head           -> origin/gh/rtimpe/29/head
2025-12-04T09:17:18.6884047Z  * [new branch]              gh/rtimpe/29/orig           -> origin/gh/rtimpe/29/orig
2025-12-04T09:17:18.6886546Z  * [new branch]              gh/rtimpe/3/base            -> origin/gh/rtimpe/3/base
2025-12-04T09:17:18.6888289Z  * [new branch]              gh/rtimpe/3/head            -> origin/gh/rtimpe/3/head
2025-12-04T09:17:18.6890793Z  * [new branch]              gh/rtimpe/30/base           -> origin/gh/rtimpe/30/base
2025-12-04T09:17:18.6893050Z  * [new branch]              gh/rtimpe/30/head           -> origin/gh/rtimpe/30/head
2025-12-04T09:17:18.6894867Z  * [new branch]              gh/rtimpe/30/orig           -> origin/gh/rtimpe/30/orig
2025-12-04T09:17:18.6897331Z  * [new branch]              gh/rtimpe/31/base           -> origin/gh/rtimpe/31/base
2025-12-04T09:17:18.6899221Z  * [new branch]              gh/rtimpe/31/head           -> origin/gh/rtimpe/31/head
2025-12-04T09:17:18.6901205Z  * [new branch]              gh/rtimpe/31/orig           -> origin/gh/rtimpe/31/orig
2025-12-04T09:17:18.6903746Z  * [new branch]              gh/rtimpe/32/base           -> origin/gh/rtimpe/32/base
2025-12-04T09:17:18.6905590Z  * [new branch]              gh/rtimpe/32/head           -> origin/gh/rtimpe/32/head
2025-12-04T09:17:18.6907342Z  * [new branch]              gh/rtimpe/32/orig           -> origin/gh/rtimpe/32/orig
2025-12-04T09:17:18.6910170Z  * [new branch]              gh/rtimpe/33/base           -> origin/gh/rtimpe/33/base
2025-12-04T09:17:18.6911993Z  * [new branch]              gh/rtimpe/33/head           -> origin/gh/rtimpe/33/head
2025-12-04T09:17:18.6913846Z  * [new branch]              gh/rtimpe/33/orig           -> origin/gh/rtimpe/33/orig
2025-12-04T09:17:18.6916232Z  * [new branch]              gh/rtimpe/34/base           -> origin/gh/rtimpe/34/base
2025-12-04T09:17:18.6918095Z  * [new branch]              gh/rtimpe/34/head           -> origin/gh/rtimpe/34/head
2025-12-04T09:17:18.6920115Z  * [new branch]              gh/rtimpe/34/orig           -> origin/gh/rtimpe/34/orig
2025-12-04T09:17:18.6922441Z  * [new branch]              gh/rtimpe/35/base           -> origin/gh/rtimpe/35/base
2025-12-04T09:17:18.6924307Z  * [new branch]              gh/rtimpe/35/head           -> origin/gh/rtimpe/35/head
2025-12-04T09:17:18.6926142Z  * [new branch]              gh/rtimpe/35/orig           -> origin/gh/rtimpe/35/orig
2025-12-04T09:17:18.6928732Z  * [new branch]              gh/rtimpe/4/base            -> origin/gh/rtimpe/4/base
2025-12-04T09:17:18.6930479Z  * [new branch]              gh/rtimpe/4/head            -> origin/gh/rtimpe/4/head
2025-12-04T09:17:18.6933586Z  * [new branch]              gh/ruisizhang123/1/base     -> origin/gh/ruisizhang123/1/base
2025-12-04T09:17:18.6935531Z  * [new branch]              gh/ruisizhang123/1/head     -> origin/gh/ruisizhang123/1/head
2025-12-04T09:17:18.6937371Z  * [new branch]              gh/ruisizhang123/1/orig     -> origin/gh/ruisizhang123/1/orig
2025-12-04T09:17:18.6940042Z  * [new branch]              gh/ruisizhang123/4/base     -> origin/gh/ruisizhang123/4/base
2025-12-04T09:17:18.6941936Z  * [new branch]              gh/ruisizhang123/4/head     -> origin/gh/ruisizhang123/4/head
2025-12-04T09:17:18.6943729Z  * [new branch]              gh/ruisizhang123/4/orig     -> origin/gh/ruisizhang123/4/orig
2025-12-04T09:17:18.6946333Z  * [new branch]              gh/ruisizhang123/5/base     -> origin/gh/ruisizhang123/5/base
2025-12-04T09:17:18.6948132Z  * [new branch]              gh/ruisizhang123/5/head     -> origin/gh/ruisizhang123/5/head
2025-12-04T09:17:18.6950046Z  * [new branch]              gh/ruisizhang123/5/orig     -> origin/gh/ruisizhang123/5/orig
2025-12-04T09:17:18.6952601Z  * [new branch]              gh/ruisizhang123/6/base     -> origin/gh/ruisizhang123/6/base
2025-12-04T09:17:18.6954447Z  * [new branch]              gh/ruisizhang123/6/head     -> origin/gh/ruisizhang123/6/head
2025-12-04T09:17:18.6956232Z  * [new branch]              gh/ruisizhang123/6/orig     -> origin/gh/ruisizhang123/6/orig
2025-12-04T09:17:18.6958820Z  * [new branch]              gh/ruisizhang123/7/base     -> origin/gh/ruisizhang123/7/base
2025-12-04T09:17:18.6960642Z  * [new branch]              gh/ruisizhang123/7/head     -> origin/gh/ruisizhang123/7/head
2025-12-04T09:17:18.6962438Z  * [new branch]              gh/ruisizhang123/7/orig     -> origin/gh/ruisizhang123/7/orig
2025-12-04T09:17:18.6964876Z  * [new branch]              gh/ruisizhang123/8/base     -> origin/gh/ruisizhang123/8/base
2025-12-04T09:17:18.6966663Z  * [new branch]              gh/ruisizhang123/8/head     -> origin/gh/ruisizhang123/8/head
2025-12-04T09:17:18.6968478Z  * [new branch]              gh/ruisizhang123/8/orig     -> origin/gh/ruisizhang123/8/orig
2025-12-04T09:17:18.6970953Z  * [new branch]              gh/ruisizhang123/9/base     -> origin/gh/ruisizhang123/9/base
2025-12-04T09:17:18.6972753Z  * [new branch]              gh/ruisizhang123/9/head     -> origin/gh/ruisizhang123/9/head
2025-12-04T09:17:18.6974591Z  * [new branch]              gh/ruisizhang123/9/orig     -> origin/gh/ruisizhang123/9/orig
2025-12-04T09:17:18.6977853Z  * [new branch]              gh/seemethere/52/base       -> origin/gh/seemethere/52/base
2025-12-04T09:17:18.6979734Z  * [new branch]              gh/seemethere/52/head       -> origin/gh/seemethere/52/head
2025-12-04T09:17:18.6981580Z  * [new branch]              gh/seemethere/52/orig       -> origin/gh/seemethere/52/orig
2025-12-04T09:17:18.6984044Z  * [new branch]              gh/seemethere/53/base       -> origin/gh/seemethere/53/base
2025-12-04T09:17:18.6985872Z  * [new branch]              gh/seemethere/53/head       -> origin/gh/seemethere/53/head
2025-12-04T09:17:18.6987692Z  * [new branch]              gh/seemethere/53/orig       -> origin/gh/seemethere/53/orig
2025-12-04T09:17:18.6990192Z  * [new branch]              gh/seemethere/54/base       -> origin/gh/seemethere/54/base
2025-12-04T09:17:18.6992028Z  * [new branch]              gh/seemethere/54/head       -> origin/gh/seemethere/54/head
2025-12-04T09:17:18.6994037Z  * [new branch]              gh/seemethere/54/orig       -> origin/gh/seemethere/54/orig
2025-12-04T09:17:18.6996849Z  * [new branch]              gh/seemethere/55/base       -> origin/gh/seemethere/55/base
2025-12-04T09:17:18.6998588Z  * [new branch]              gh/seemethere/55/head       -> origin/gh/seemethere/55/head
2025-12-04T09:17:18.7000456Z  * [new branch]              gh/seemethere/55/orig       -> origin/gh/seemethere/55/orig
2025-12-04T09:17:18.7002958Z  * [new branch]              gh/seemethere/59/base       -> origin/gh/seemethere/59/base
2025-12-04T09:17:18.7004816Z  * [new branch]              gh/seemethere/59/head       -> origin/gh/seemethere/59/head
2025-12-04T09:17:18.7006683Z  * [new branch]              gh/seemethere/59/orig       -> origin/gh/seemethere/59/orig
2025-12-04T09:17:18.7009310Z  * [new branch]              gh/seemethere/62/base       -> origin/gh/seemethere/62/base
2025-12-04T09:17:18.7011215Z  * [new branch]              gh/seemethere/62/head       -> origin/gh/seemethere/62/head
2025-12-04T09:17:18.7013069Z  * [new branch]              gh/seemethere/62/orig       -> origin/gh/seemethere/62/orig
2025-12-04T09:17:18.7015603Z  * [new branch]              gh/seemethere/63/base       -> origin/gh/seemethere/63/base
2025-12-04T09:17:18.7017427Z  * [new branch]              gh/seemethere/63/head       -> origin/gh/seemethere/63/head
2025-12-04T09:17:18.7019401Z  * [new branch]              gh/seemethere/63/orig       -> origin/gh/seemethere/63/orig
2025-12-04T09:17:18.7021954Z  * [new branch]              gh/seemethere/71/base       -> origin/gh/seemethere/71/base
2025-12-04T09:17:18.7023726Z  * [new branch]              gh/seemethere/71/head       -> origin/gh/seemethere/71/head
2025-12-04T09:17:18.7025524Z  * [new branch]              gh/seemethere/71/orig       -> origin/gh/seemethere/71/orig
2025-12-04T09:17:18.7028263Z  * [new branch]              gh/seemethere/72/base       -> origin/gh/seemethere/72/base
2025-12-04T09:17:18.7030055Z  * [new branch]              gh/seemethere/72/head       -> origin/gh/seemethere/72/head
2025-12-04T09:17:18.7031858Z  * [new branch]              gh/seemethere/72/orig       -> origin/gh/seemethere/72/orig
2025-12-04T09:17:18.7034380Z  * [new branch]              gh/seemethere/73/base       -> origin/gh/seemethere/73/base
2025-12-04T09:17:18.7036204Z  * [new branch]              gh/seemethere/73/head       -> origin/gh/seemethere/73/head
2025-12-04T09:17:18.7037973Z  * [new branch]              gh/seemethere/73/orig       -> origin/gh/seemethere/73/orig
2025-12-04T09:17:18.7040526Z  * [new branch]              gh/seemethere/74/base       -> origin/gh/seemethere/74/base
2025-12-04T09:17:18.7042339Z  * [new branch]              gh/seemethere/74/head       -> origin/gh/seemethere/74/head
2025-12-04T09:17:18.7044185Z  * [new branch]              gh/seemethere/74/orig       -> origin/gh/seemethere/74/orig
2025-12-04T09:17:18.7046613Z  * [new branch]              gh/seemethere/75/base       -> origin/gh/seemethere/75/base
2025-12-04T09:17:18.7048444Z  * [new branch]              gh/seemethere/75/head       -> origin/gh/seemethere/75/head
2025-12-04T09:17:18.7050283Z  * [new branch]              gh/seemethere/75/orig       -> origin/gh/seemethere/75/orig
2025-12-04T09:17:18.7053046Z  * [new branch]              gh/seemethere/76/base       -> origin/gh/seemethere/76/base
2025-12-04T09:17:18.7054806Z  * [new branch]              gh/seemethere/76/head       -> origin/gh/seemethere/76/head
2025-12-04T09:17:18.7056678Z  * [new branch]              gh/seemethere/76/orig       -> origin/gh/seemethere/76/orig
2025-12-04T09:17:18.7060029Z  * [new branch]              gh/shunting314/145/base     -> origin/gh/shunting314/145/base
2025-12-04T09:17:18.7061947Z  * [new branch]              gh/shunting314/145/head     -> origin/gh/shunting314/145/head
2025-12-04T09:17:18.7063827Z  * [new branch]              gh/shunting314/145/orig     -> origin/gh/shunting314/145/orig
2025-12-04T09:17:18.7067200Z  * [new branch]              gh/shunting314/176/base     -> origin/gh/shunting314/176/base
2025-12-04T09:17:18.7069109Z  * [new branch]              gh/shunting314/176/head     -> origin/gh/shunting314/176/head
2025-12-04T09:17:18.7070969Z  * [new branch]              gh/shunting314/176/orig     -> origin/gh/shunting314/176/orig
2025-12-04T09:17:18.7073583Z  * [new branch]              gh/shunting314/249/base     -> origin/gh/shunting314/249/base
2025-12-04T09:17:18.7075434Z  * [new branch]              gh/shunting314/249/head     -> origin/gh/shunting314/249/head
2025-12-04T09:17:18.7077438Z  * [new branch]              gh/shunting314/249/orig     -> origin/gh/shunting314/249/orig
2025-12-04T09:17:18.7079990Z  * [new branch]              gh/shunting314/253/base     -> origin/gh/shunting314/253/base
2025-12-04T09:17:18.7082279Z  * [new branch]              gh/shunting314/253/head     -> origin/gh/shunting314/253/head
2025-12-04T09:17:18.7084155Z  * [new branch]              gh/shunting314/253/orig     -> origin/gh/shunting314/253/orig
2025-12-04T09:17:18.7086649Z  * [new branch]              gh/shunting314/256/base     -> origin/gh/shunting314/256/base
2025-12-04T09:17:18.7088459Z  * [new branch]              gh/shunting314/256/head     -> origin/gh/shunting314/256/head
2025-12-04T09:17:18.7090292Z  * [new branch]              gh/shunting314/256/orig     -> origin/gh/shunting314/256/orig
2025-12-04T09:17:18.7093103Z  * [new branch]              gh/shunting314/257/base     -> origin/gh/shunting314/257/base
2025-12-04T09:17:18.7094939Z  * [new branch]              gh/shunting314/257/head     -> origin/gh/shunting314/257/head
2025-12-04T09:17:18.7096734Z  * [new branch]              gh/shunting314/257/orig     -> origin/gh/shunting314/257/orig
2025-12-04T09:17:18.7099490Z  * [new branch]              gh/shunting314/258/base     -> origin/gh/shunting314/258/base
2025-12-04T09:17:18.7101285Z  * [new branch]              gh/shunting314/258/head     -> origin/gh/shunting314/258/head
2025-12-04T09:17:18.7103310Z  * [new branch]              gh/shunting314/258/orig     -> origin/gh/shunting314/258/orig
2025-12-04T09:17:18.7105616Z  * [new branch]              gh/shunting314/259/base     -> origin/gh/shunting314/259/base
2025-12-04T09:17:18.7107419Z  * [new branch]              gh/shunting314/259/head     -> origin/gh/shunting314/259/head
2025-12-04T09:17:18.7110793Z  * [new branch]              gh/shunting314/259/orig     -> origin/gh/shunting314/259/orig
2025-12-04T09:17:18.7113592Z  * [new branch]              gh/shunting314/260/base     -> origin/gh/shunting314/260/base
2025-12-04T09:17:18.7115480Z  * [new branch]              gh/shunting314/260/head     -> origin/gh/shunting314/260/head
2025-12-04T09:17:18.7117600Z  * [new branch]              gh/shunting314/260/orig     -> origin/gh/shunting314/260/orig
2025-12-04T09:17:18.7119932Z  * [new branch]              gh/shunting314/261/base     -> origin/gh/shunting314/261/base
2025-12-04T09:17:18.7121919Z  * [new branch]              gh/shunting314/261/head     -> origin/gh/shunting314/261/head
2025-12-04T09:17:18.7123677Z  * [new branch]              gh/shunting314/261/orig     -> origin/gh/shunting314/261/orig
2025-12-04T09:17:18.7126213Z  * [new branch]              gh/shunting314/262/base     -> origin/gh/shunting314/262/base
2025-12-04T09:17:18.7128129Z  * [new branch]              gh/shunting314/262/head     -> origin/gh/shunting314/262/head
2025-12-04T09:17:18.7130220Z  * [new branch]              gh/shunting314/262/orig     -> origin/gh/shunting314/262/orig
2025-12-04T09:17:18.7132734Z  * [new branch]              gh/shunting314/263/base     -> origin/gh/shunting314/263/base
2025-12-04T09:17:18.7134714Z  * [new branch]              gh/shunting314/263/head     -> origin/gh/shunting314/263/head
2025-12-04T09:17:18.7136602Z  * [new branch]              gh/shunting314/263/orig     -> origin/gh/shunting314/263/orig
2025-12-04T09:17:18.7139284Z  * [new branch]              gh/shunting314/264/base     -> origin/gh/shunting314/264/base
2025-12-04T09:17:18.7141310Z  * [new branch]              gh/shunting314/264/head     -> origin/gh/shunting314/264/head
2025-12-04T09:17:18.7143117Z  * [new branch]              gh/shunting314/264/orig     -> origin/gh/shunting314/264/orig
2025-12-04T09:17:18.7146130Z  * [new branch]              gh/shunting314/265/base     -> origin/gh/shunting314/265/base
2025-12-04T09:17:18.7147633Z  * [new branch]              gh/shunting314/265/head     -> origin/gh/shunting314/265/head
2025-12-04T09:17:18.7149630Z  * [new branch]              gh/shunting314/265/orig     -> origin/gh/shunting314/265/orig
2025-12-04T09:17:18.7152422Z  * [new branch]              gh/shunting314/266/base     -> origin/gh/shunting314/266/base
2025-12-04T09:17:18.7154607Z  * [new branch]              gh/shunting314/266/head     -> origin/gh/shunting314/266/head
2025-12-04T09:17:18.7156503Z  * [new branch]              gh/shunting314/266/orig     -> origin/gh/shunting314/266/orig
2025-12-04T09:17:18.7159357Z  * [new branch]              gh/shunting314/267/base     -> origin/gh/shunting314/267/base
2025-12-04T09:17:18.7161407Z  * [new branch]              gh/shunting314/267/head     -> origin/gh/shunting314/267/head
2025-12-04T09:17:18.7163138Z  * [new branch]              gh/shunting314/267/orig     -> origin/gh/shunting314/267/orig
2025-12-04T09:17:18.7166212Z  * [new branch]              gh/shunting314/268/base     -> origin/gh/shunting314/268/base
2025-12-04T09:17:18.7168229Z  * [new branch]              gh/shunting314/268/head     -> origin/gh/shunting314/268/head
2025-12-04T09:17:18.7170059Z  * [new branch]              gh/shunting314/268/orig     -> origin/gh/shunting314/268/orig
2025-12-04T09:17:18.7173156Z  * [new branch]              gh/shunting314/269/base     -> origin/gh/shunting314/269/base
2025-12-04T09:17:18.7175055Z  * [new branch]              gh/shunting314/269/head     -> origin/gh/shunting314/269/head
2025-12-04T09:17:18.7176852Z  * [new branch]              gh/shunting314/269/orig     -> origin/gh/shunting314/269/orig
2025-12-04T09:17:18.7180123Z  * [new branch]              gh/silverguo/1/base         -> origin/gh/silverguo/1/base
2025-12-04T09:17:18.7182490Z  * [new branch]              gh/silverguo/1/head         -> origin/gh/silverguo/1/head
2025-12-04T09:17:18.7184746Z  * [new branch]              gh/silverguo/2/base         -> origin/gh/silverguo/2/base
2025-12-04T09:17:18.7186536Z  * [new branch]              gh/silverguo/2/head         -> origin/gh/silverguo/2/head
2025-12-04T09:17:18.7188917Z  * [new branch]              gh/silverguo/3/base         -> origin/gh/silverguo/3/base
2025-12-04T09:17:18.7190672Z  * [new branch]              gh/silverguo/3/head         -> origin/gh/silverguo/3/head
2025-12-04T09:17:18.7193089Z  * [new branch]              gh/silverguo/4/base         -> origin/gh/silverguo/4/base
2025-12-04T09:17:18.7195511Z  * [new branch]              gh/silverguo/4/head         -> origin/gh/silverguo/4/head
2025-12-04T09:17:18.7198462Z  * [new branch]              gh/slayton58/39/base        -> origin/gh/slayton58/39/base
2025-12-04T09:17:18.7200241Z  * [new branch]              gh/slayton58/39/head        -> origin/gh/slayton58/39/head
2025-12-04T09:17:18.7202290Z  * [new branch]              gh/slayton58/39/orig        -> origin/gh/slayton58/39/orig
2025-12-04T09:17:18.7204757Z  * [new branch]              gh/slayton58/42/base        -> origin/gh/slayton58/42/base
2025-12-04T09:17:18.7206662Z  * [new branch]              gh/slayton58/42/head        -> origin/gh/slayton58/42/head
2025-12-04T09:17:18.7208644Z  * [new branch]              gh/slayton58/42/orig        -> origin/gh/slayton58/42/orig
2025-12-04T09:17:18.7211663Z  * [new branch]              gh/slayton58/43/base        -> origin/gh/slayton58/43/base
2025-12-04T09:17:18.7213143Z  * [new branch]              gh/slayton58/43/head        -> origin/gh/slayton58/43/head
2025-12-04T09:17:18.7214927Z  * [new branch]              gh/slayton58/43/orig        -> origin/gh/slayton58/43/orig
2025-12-04T09:17:18.7217545Z  * [new branch]              gh/slayton58/44/base        -> origin/gh/slayton58/44/base
2025-12-04T09:17:18.7219789Z  * [new branch]              gh/slayton58/44/head        -> origin/gh/slayton58/44/head
2025-12-04T09:17:18.7221854Z  * [new branch]              gh/slayton58/44/orig        -> origin/gh/slayton58/44/orig
2025-12-04T09:17:18.7224080Z  * [new branch]              gh/slayton58/45/base        -> origin/gh/slayton58/45/base
2025-12-04T09:17:18.7225804Z  * [new branch]              gh/slayton58/45/head        -> origin/gh/slayton58/45/head
2025-12-04T09:17:18.7227693Z  * [new branch]              gh/slayton58/45/orig        -> origin/gh/slayton58/45/orig
2025-12-04T09:17:18.7230276Z  * [new branch]              gh/slayton58/46/base        -> origin/gh/slayton58/46/base
2025-12-04T09:17:18.7232171Z  * [new branch]              gh/slayton58/46/head        -> origin/gh/slayton58/46/head
2025-12-04T09:17:18.7234071Z  * [new branch]              gh/slayton58/46/orig        -> origin/gh/slayton58/46/orig
2025-12-04T09:17:18.7236535Z  * [new branch]              gh/slayton58/6/base         -> origin/gh/slayton58/6/base
2025-12-04T09:17:18.7238419Z  * [new branch]              gh/slayton58/6/head         -> origin/gh/slayton58/6/head
2025-12-04T09:17:18.7240763Z  * [new branch]              gh/slayton58/7/base         -> origin/gh/slayton58/7/base
2025-12-04T09:17:18.7242614Z  * [new branch]              gh/slayton58/7/head         -> origin/gh/slayton58/7/head
2025-12-04T09:17:18.7245784Z  * [new branch]              gh/soulitzer/269/base       -> origin/gh/soulitzer/269/base
2025-12-04T09:17:18.7247569Z  * [new branch]              gh/soulitzer/269/head       -> origin/gh/soulitzer/269/head
2025-12-04T09:17:18.7249748Z  * [new branch]              gh/soulitzer/269/orig       -> origin/gh/soulitzer/269/orig
2025-12-04T09:17:18.7252163Z  * [new branch]              gh/soulitzer/276/base       -> origin/gh/soulitzer/276/base
2025-12-04T09:17:18.7253970Z  * [new branch]              gh/soulitzer/276/head       -> origin/gh/soulitzer/276/head
2025-12-04T09:17:18.7255758Z  * [new branch]              gh/soulitzer/276/orig       -> origin/gh/soulitzer/276/orig
2025-12-04T09:17:18.7258720Z  * [new branch]              gh/soulitzer/287/base       -> origin/gh/soulitzer/287/base
2025-12-04T09:17:18.7260514Z  * [new branch]              gh/soulitzer/287/head       -> origin/gh/soulitzer/287/head
2025-12-04T09:17:18.7262354Z  * [new branch]              gh/soulitzer/287/orig       -> origin/gh/soulitzer/287/orig
2025-12-04T09:17:18.7265183Z  * [new branch]              gh/soulitzer/296/base       -> origin/gh/soulitzer/296/base
2025-12-04T09:17:18.7267084Z  * [new branch]              gh/soulitzer/296/head       -> origin/gh/soulitzer/296/head
2025-12-04T09:17:18.7268945Z  * [new branch]              gh/soulitzer/296/orig       -> origin/gh/soulitzer/296/orig
2025-12-04T09:17:18.7271501Z  * [new branch]              gh/soulitzer/299/base       -> origin/gh/soulitzer/299/base
2025-12-04T09:17:18.7273583Z  * [new branch]              gh/soulitzer/299/head       -> origin/gh/soulitzer/299/head
2025-12-04T09:17:18.7275461Z  * [new branch]              gh/soulitzer/299/orig       -> origin/gh/soulitzer/299/orig
2025-12-04T09:17:18.7277857Z  * [new branch]              gh/soulitzer/300/base       -> origin/gh/soulitzer/300/base
2025-12-04T09:17:18.7279871Z  * [new branch]              gh/soulitzer/300/head       -> origin/gh/soulitzer/300/head
2025-12-04T09:17:18.7281753Z  * [new branch]              gh/soulitzer/300/orig       -> origin/gh/soulitzer/300/orig
2025-12-04T09:17:18.7284355Z  * [new branch]              gh/soulitzer/301/base       -> origin/gh/soulitzer/301/base
2025-12-04T09:17:18.7286627Z  * [new branch]              gh/soulitzer/301/head       -> origin/gh/soulitzer/301/head
2025-12-04T09:17:18.7287996Z  * [new branch]              gh/soulitzer/301/orig       -> origin/gh/soulitzer/301/orig
2025-12-04T09:17:18.7290573Z  * [new branch]              gh/soulitzer/313/base       -> origin/gh/soulitzer/313/base
2025-12-04T09:17:18.7292370Z  * [new branch]              gh/soulitzer/313/head       -> origin/gh/soulitzer/313/head
2025-12-04T09:17:18.7294268Z  * [new branch]              gh/soulitzer/313/orig       -> origin/gh/soulitzer/313/orig
2025-12-04T09:17:18.7296875Z  * [new branch]              gh/soulitzer/319/base       -> origin/gh/soulitzer/319/base
2025-12-04T09:17:18.7298741Z  * [new branch]              gh/soulitzer/319/head       -> origin/gh/soulitzer/319/head
2025-12-04T09:17:18.7300698Z  * [new branch]              gh/soulitzer/319/orig       -> origin/gh/soulitzer/319/orig
2025-12-04T09:17:18.7303207Z  * [new branch]              gh/soulitzer/320/base       -> origin/gh/soulitzer/320/base
2025-12-04T09:17:18.7305050Z  * [new branch]              gh/soulitzer/320/head       -> origin/gh/soulitzer/320/head
2025-12-04T09:17:18.7306816Z  * [new branch]              gh/soulitzer/320/orig       -> origin/gh/soulitzer/320/orig
2025-12-04T09:17:18.7310196Z  * [new branch]              gh/soulitzer/336/base       -> origin/gh/soulitzer/336/base
2025-12-04T09:17:18.7312302Z  * [new branch]              gh/soulitzer/336/head       -> origin/gh/soulitzer/336/head
2025-12-04T09:17:18.7313946Z  * [new branch]              gh/soulitzer/336/orig       -> origin/gh/soulitzer/336/orig
2025-12-04T09:17:18.7316607Z  * [new branch]              gh/soulitzer/347/base       -> origin/gh/soulitzer/347/base
2025-12-04T09:17:18.7318527Z  * [new branch]              gh/soulitzer/347/head       -> origin/gh/soulitzer/347/head
2025-12-04T09:17:18.7320275Z  * [new branch]              gh/soulitzer/347/orig       -> origin/gh/soulitzer/347/orig
2025-12-04T09:17:18.7323435Z  * [new branch]              gh/soulitzer/349/base       -> origin/gh/soulitzer/349/base
2025-12-04T09:17:18.7325618Z  * [new branch]              gh/soulitzer/349/head       -> origin/gh/soulitzer/349/head
2025-12-04T09:17:18.7327521Z  * [new branch]              gh/soulitzer/349/orig       -> origin/gh/soulitzer/349/orig
2025-12-04T09:17:18.7329600Z  * [new branch]              gh/soulitzer/350/base       -> origin/gh/soulitzer/350/base
2025-12-04T09:17:18.7331456Z  * [new branch]              gh/soulitzer/350/head       -> origin/gh/soulitzer/350/head
2025-12-04T09:17:18.7333222Z  * [new branch]              gh/soulitzer/350/orig       -> origin/gh/soulitzer/350/orig
2025-12-04T09:17:18.7335747Z  * [new branch]              gh/soulitzer/351/base       -> origin/gh/soulitzer/351/base
2025-12-04T09:17:18.7337564Z  * [new branch]              gh/soulitzer/351/head       -> origin/gh/soulitzer/351/head
2025-12-04T09:17:18.7339620Z  * [new branch]              gh/soulitzer/351/orig       -> origin/gh/soulitzer/351/orig
2025-12-04T09:17:18.7341993Z  * [new branch]              gh/soulitzer/353/base       -> origin/gh/soulitzer/353/base
2025-12-04T09:17:18.7343966Z  * [new branch]              gh/soulitzer/353/head       -> origin/gh/soulitzer/353/head
2025-12-04T09:17:18.7345802Z  * [new branch]              gh/soulitzer/353/orig       -> origin/gh/soulitzer/353/orig
2025-12-04T09:17:18.7349072Z  * [new branch]              gh/soulitzer/358/base       -> origin/gh/soulitzer/358/base
2025-12-04T09:17:18.7351336Z  * [new branch]              gh/soulitzer/358/head       -> origin/gh/soulitzer/358/head
2025-12-04T09:17:18.7352877Z  * [new branch]              gh/soulitzer/358/orig       -> origin/gh/soulitzer/358/orig
2025-12-04T09:17:18.7355862Z  * [new branch]              gh/soulitzer/359/base       -> origin/gh/soulitzer/359/base
2025-12-04T09:17:18.7357970Z  * [new branch]              gh/soulitzer/359/head       -> origin/gh/soulitzer/359/head
2025-12-04T09:17:18.7359751Z  * [new branch]              gh/soulitzer/359/orig       -> origin/gh/soulitzer/359/orig
2025-12-04T09:17:18.7362216Z  * [new branch]              gh/soulitzer/374/base       -> origin/gh/soulitzer/374/base
2025-12-04T09:17:18.7364197Z  * [new branch]              gh/soulitzer/374/head       -> origin/gh/soulitzer/374/head
2025-12-04T09:17:18.7365962Z  * [new branch]              gh/soulitzer/374/orig       -> origin/gh/soulitzer/374/orig
2025-12-04T09:17:18.7368471Z  * [new branch]              gh/soulitzer/375/base       -> origin/gh/soulitzer/375/base
2025-12-04T09:17:18.7370728Z  * [new branch]              gh/soulitzer/375/head       -> origin/gh/soulitzer/375/head
2025-12-04T09:17:18.7372130Z  * [new branch]              gh/soulitzer/375/orig       -> origin/gh/soulitzer/375/orig
2025-12-04T09:17:18.7374784Z  * [new branch]              gh/soulitzer/380/base       -> origin/gh/soulitzer/380/base
2025-12-04T09:17:18.7376645Z  * [new branch]              gh/soulitzer/380/head       -> origin/gh/soulitzer/380/head
2025-12-04T09:17:18.7378254Z  * [new branch]              gh/soulitzer/380/orig       -> origin/gh/soulitzer/380/orig
2025-12-04T09:17:18.7381063Z  * [new branch]              gh/soulitzer/385/base       -> origin/gh/soulitzer/385/base
2025-12-04T09:17:18.7382868Z  * [new branch]              gh/soulitzer/385/head       -> origin/gh/soulitzer/385/head
2025-12-04T09:17:18.7385171Z  * [new branch]              gh/soulitzer/385/orig       -> origin/gh/soulitzer/385/orig
2025-12-04T09:17:18.7387983Z  * [new branch]              gh/soulitzer/386/base       -> origin/gh/soulitzer/386/base
2025-12-04T09:17:18.7389776Z  * [new branch]              gh/soulitzer/386/head       -> origin/gh/soulitzer/386/head
2025-12-04T09:17:18.7391628Z  * [new branch]              gh/soulitzer/386/orig       -> origin/gh/soulitzer/386/orig
2025-12-04T09:17:18.7394168Z  * [new branch]              gh/soulitzer/387/base       -> origin/gh/soulitzer/387/base
2025-12-04T09:17:18.7396015Z  * [new branch]              gh/soulitzer/387/head       -> origin/gh/soulitzer/387/head
2025-12-04T09:17:18.7397825Z  * [new branch]              gh/soulitzer/387/orig       -> origin/gh/soulitzer/387/orig
2025-12-04T09:17:18.7400301Z  * [new branch]              gh/soulitzer/388/base       -> origin/gh/soulitzer/388/base
2025-12-04T09:17:18.7402225Z  * [new branch]              gh/soulitzer/388/head       -> origin/gh/soulitzer/388/head
2025-12-04T09:17:18.7403925Z  * [new branch]              gh/soulitzer/388/orig       -> origin/gh/soulitzer/388/orig
2025-12-04T09:17:18.7406517Z  * [new branch]              gh/soulitzer/389/base       -> origin/gh/soulitzer/389/base
2025-12-04T09:17:18.7408778Z  * [new branch]              gh/soulitzer/389/head       -> origin/gh/soulitzer/389/head
2025-12-04T09:17:18.7410563Z  * [new branch]              gh/soulitzer/389/orig       -> origin/gh/soulitzer/389/orig
2025-12-04T09:17:18.7412963Z  * [new branch]              gh/soulitzer/390/base       -> origin/gh/soulitzer/390/base
2025-12-04T09:17:18.7414755Z  * [new branch]              gh/soulitzer/390/head       -> origin/gh/soulitzer/390/head
2025-12-04T09:17:18.7416649Z  * [new branch]              gh/soulitzer/390/orig       -> origin/gh/soulitzer/390/orig
2025-12-04T09:17:18.7419246Z  * [new branch]              gh/soulitzer/391/base       -> origin/gh/soulitzer/391/base
2025-12-04T09:17:18.7421223Z  * [new branch]              gh/soulitzer/391/head       -> origin/gh/soulitzer/391/head
2025-12-04T09:17:18.7423057Z  * [new branch]              gh/soulitzer/391/orig       -> origin/gh/soulitzer/391/orig
2025-12-04T09:17:18.7425551Z  * [new branch]              gh/soulitzer/392/base       -> origin/gh/soulitzer/392/base
2025-12-04T09:17:18.7427469Z  * [new branch]              gh/soulitzer/392/head       -> origin/gh/soulitzer/392/head
2025-12-04T09:17:18.7429200Z  * [new branch]              gh/soulitzer/392/orig       -> origin/gh/soulitzer/392/orig
2025-12-04T09:17:18.7432319Z  * [new branch]              gh/swolchok/728/next        -> origin/gh/swolchok/728/next
2025-12-04T09:17:18.7435305Z  * [new branch]              gh/swolchok/819/base        -> origin/gh/swolchok/819/base
2025-12-04T09:17:18.7437069Z  * [new branch]              gh/swolchok/819/head        -> origin/gh/swolchok/819/head
2025-12-04T09:17:18.7438851Z  * [new branch]              gh/swolchok/819/orig        -> origin/gh/swolchok/819/orig
2025-12-04T09:17:18.7441320Z  * [new branch]              gh/swolchok/824/base        -> origin/gh/swolchok/824/base
2025-12-04T09:17:18.7443394Z  * [new branch]              gh/swolchok/824/head        -> origin/gh/swolchok/824/head
2025-12-04T09:17:18.7445061Z  * [new branch]              gh/swolchok/824/orig        -> origin/gh/swolchok/824/orig
2025-12-04T09:17:18.7447478Z  * [new branch]              gh/swolchok/829/base        -> origin/gh/swolchok/829/base
2025-12-04T09:17:18.7449273Z  * [new branch]              gh/swolchok/829/head        -> origin/gh/swolchok/829/head
2025-12-04T09:17:18.7451537Z  * [new branch]              gh/swolchok/829/orig        -> origin/gh/swolchok/829/orig
2025-12-04T09:17:18.7454478Z  * [new branch]              gh/swolchok/839/base        -> origin/gh/swolchok/839/base
2025-12-04T09:17:18.7455984Z  * [new branch]              gh/swolchok/839/head        -> origin/gh/swolchok/839/head
2025-12-04T09:17:18.7457817Z  * [new branch]              gh/swolchok/839/orig        -> origin/gh/swolchok/839/orig
2025-12-04T09:17:18.7460575Z  * [new branch]              gh/swolchok/841/base        -> origin/gh/swolchok/841/base
2025-12-04T09:17:18.7462391Z  * [new branch]              gh/swolchok/841/head        -> origin/gh/swolchok/841/head
2025-12-04T09:17:18.7464308Z  * [new branch]              gh/swolchok/841/orig        -> origin/gh/swolchok/841/orig
2025-12-04T09:17:18.7466788Z  * [new branch]              gh/swolchok/842/base        -> origin/gh/swolchok/842/base
2025-12-04T09:17:18.7468563Z  * [new branch]              gh/swolchok/842/head        -> origin/gh/swolchok/842/head
2025-12-04T09:17:18.7470353Z  * [new branch]              gh/swolchok/842/orig        -> origin/gh/swolchok/842/orig
2025-12-04T09:17:18.7472809Z  * [new branch]              gh/swolchok/845/base        -> origin/gh/swolchok/845/base
2025-12-04T09:17:18.7474609Z  * [new branch]              gh/swolchok/845/head        -> origin/gh/swolchok/845/head
2025-12-04T09:17:18.7476618Z  * [new branch]              gh/swolchok/845/orig        -> origin/gh/swolchok/845/orig
2025-12-04T09:17:18.7479079Z  * [new branch]              gh/swolchok/848/base        -> origin/gh/swolchok/848/base
2025-12-04T09:17:18.7480983Z  * [new branch]              gh/swolchok/848/head        -> origin/gh/swolchok/848/head
2025-12-04T09:17:18.7482857Z  * [new branch]              gh/swolchok/848/orig        -> origin/gh/swolchok/848/orig
2025-12-04T09:17:18.7485439Z  * [new branch]              gh/swolchok/856/base        -> origin/gh/swolchok/856/base
2025-12-04T09:17:18.7487371Z  * [new branch]              gh/swolchok/856/head        -> origin/gh/swolchok/856/head
2025-12-04T09:17:18.7489199Z  * [new branch]              gh/swolchok/856/orig        -> origin/gh/swolchok/856/orig
2025-12-04T09:17:18.7491733Z  * [new branch]              gh/swolchok/860/base        -> origin/gh/swolchok/860/base
2025-12-04T09:17:18.7493587Z  * [new branch]              gh/swolchok/860/head        -> origin/gh/swolchok/860/head
2025-12-04T09:17:18.7495347Z  * [new branch]              gh/swolchok/860/orig        -> origin/gh/swolchok/860/orig
2025-12-04T09:17:18.7498134Z  * [new branch]              gh/swolchok/861/base        -> origin/gh/swolchok/861/base
2025-12-04T09:17:18.7500246Z  * [new branch]              gh/swolchok/861/head        -> origin/gh/swolchok/861/head
2025-12-04T09:17:18.7502208Z  * [new branch]              gh/swolchok/861/orig        -> origin/gh/swolchok/861/orig
2025-12-04T09:17:18.7504762Z  * [new branch]              gh/swolchok/862/base        -> origin/gh/swolchok/862/base
2025-12-04T09:17:18.7506534Z  * [new branch]              gh/swolchok/862/head        -> origin/gh/swolchok/862/head
2025-12-04T09:17:18.7508182Z  * [new branch]              gh/swolchok/862/orig        -> origin/gh/swolchok/862/orig
2025-12-04T09:17:18.7511321Z  * [new branch]              gh/swolchok/863/base        -> origin/gh/swolchok/863/base
2025-12-04T09:17:18.7513162Z  * [new branch]              gh/swolchok/863/head        -> origin/gh/swolchok/863/head
2025-12-04T09:17:18.7515066Z  * [new branch]              gh/swolchok/863/orig        -> origin/gh/swolchok/863/orig
2025-12-04T09:17:18.7517884Z  * [new branch]              gh/swolchok/864/base        -> origin/gh/swolchok/864/base
2025-12-04T09:17:18.7519543Z  * [new branch]              gh/swolchok/864/head        -> origin/gh/swolchok/864/head
2025-12-04T09:17:18.7521332Z  * [new branch]              gh/swolchok/864/orig        -> origin/gh/swolchok/864/orig
2025-12-04T09:17:18.7524008Z  * [new branch]              gh/swolchok/865/base        -> origin/gh/swolchok/865/base
2025-12-04T09:17:18.7526038Z  * [new branch]              gh/swolchok/865/head        -> origin/gh/swolchok/865/head
2025-12-04T09:17:18.7527948Z  * [new branch]              gh/swolchok/865/orig        -> origin/gh/swolchok/865/orig
2025-12-04T09:17:18.7531063Z  * [new branch]              gh/swolchok/866/base        -> origin/gh/swolchok/866/base
2025-12-04T09:17:18.7532845Z  * [new branch]              gh/swolchok/866/head        -> origin/gh/swolchok/866/head
2025-12-04T09:17:18.7534705Z  * [new branch]              gh/swolchok/866/orig        -> origin/gh/swolchok/866/orig
2025-12-04T09:17:18.7537330Z  * [new branch]              gh/swolchok/867/base        -> origin/gh/swolchok/867/base
2025-12-04T09:17:18.7539085Z  * [new branch]              gh/swolchok/867/head        -> origin/gh/swolchok/867/head
2025-12-04T09:17:18.7541121Z  * [new branch]              gh/swolchok/867/orig        -> origin/gh/swolchok/867/orig
2025-12-04T09:17:18.7543581Z  * [new branch]              gh/swolchok/868/base        -> origin/gh/swolchok/868/base
2025-12-04T09:17:18.7545690Z  * [new branch]              gh/swolchok/868/head        -> origin/gh/swolchok/868/head
2025-12-04T09:17:18.7547487Z  * [new branch]              gh/swolchok/868/orig        -> origin/gh/swolchok/868/orig
2025-12-04T09:17:18.7549735Z  * [new branch]              gh/swolchok/869/base        -> origin/gh/swolchok/869/base
2025-12-04T09:17:18.7551630Z  * [new branch]              gh/swolchok/869/head        -> origin/gh/swolchok/869/head
2025-12-04T09:17:18.7553588Z  * [new branch]              gh/swolchok/869/orig        -> origin/gh/swolchok/869/orig
2025-12-04T09:17:18.7556218Z  * [new branch]              gh/swolchok/870/base        -> origin/gh/swolchok/870/base
2025-12-04T09:17:18.7558078Z  * [new branch]              gh/swolchok/870/head        -> origin/gh/swolchok/870/head
2025-12-04T09:17:18.7559987Z  * [new branch]              gh/swolchok/870/orig        -> origin/gh/swolchok/870/orig
2025-12-04T09:17:18.7562553Z  * [new branch]              gh/swolchok/871/base        -> origin/gh/swolchok/871/base
2025-12-04T09:17:18.7564528Z  * [new branch]              gh/swolchok/871/head        -> origin/gh/swolchok/871/head
2025-12-04T09:17:18.7566410Z  * [new branch]              gh/swolchok/871/orig        -> origin/gh/swolchok/871/orig
2025-12-04T09:17:18.7569701Z  * [new branch]              gh/teja-rao/4/base          -> origin/gh/teja-rao/4/base
2025-12-04T09:17:18.7571588Z  * [new branch]              gh/teja-rao/4/head          -> origin/gh/teja-rao/4/head
2025-12-04T09:17:18.7573433Z  * [new branch]              gh/teja-rao/4/orig          -> origin/gh/teja-rao/4/orig
2025-12-04T09:17:18.7576619Z  * [new branch]              gh/tianyu-l/2/base          -> origin/gh/tianyu-l/2/base
2025-12-04T09:17:18.7578445Z  * [new branch]              gh/tianyu-l/2/head          -> origin/gh/tianyu-l/2/head
2025-12-04T09:17:18.7580403Z  * [new branch]              gh/tianyu-l/2/orig          -> origin/gh/tianyu-l/2/orig
2025-12-04T09:17:18.7583050Z  * [new branch]              gh/tianyu-l/3/base          -> origin/gh/tianyu-l/3/base
2025-12-04T09:17:18.7584916Z  * [new branch]              gh/tianyu-l/3/orig          -> origin/gh/tianyu-l/3/orig
2025-12-04T09:17:18.7587407Z  * [new branch]              gh/tianyu-l/4/base          -> origin/gh/tianyu-l/4/base
2025-12-04T09:17:18.7589194Z  * [new branch]              gh/tianyu-l/4/head          -> origin/gh/tianyu-l/4/head
2025-12-04T09:17:18.7591032Z  * [new branch]              gh/tianyu-l/4/orig          -> origin/gh/tianyu-l/4/orig
2025-12-04T09:17:18.7594551Z  * [new branch]              gh/tugsbayasgalan/10/base   -> origin/gh/tugsbayasgalan/10/base
2025-12-04T09:17:18.7596333Z  * [new branch]              gh/tugsbayasgalan/10/head   -> origin/gh/tugsbayasgalan/10/head
2025-12-04T09:17:18.7598144Z  * [new branch]              gh/tugsbayasgalan/10/orig   -> origin/gh/tugsbayasgalan/10/orig
2025-12-04T09:17:18.7600697Z  * [new branch]              gh/tugsbayasgalan/13/base   -> origin/gh/tugsbayasgalan/13/base
2025-12-04T09:17:18.7602581Z  * [new branch]              gh/tugsbayasgalan/13/head   -> origin/gh/tugsbayasgalan/13/head
2025-12-04T09:17:18.7604434Z  * [new branch]              gh/tugsbayasgalan/13/orig   -> origin/gh/tugsbayasgalan/13/orig
2025-12-04T09:17:18.7607138Z  * [new branch]              gh/tugsbayasgalan/17/base   -> origin/gh/tugsbayasgalan/17/base
2025-12-04T09:17:18.7615630Z  * [new branch]              gh/tugsbayasgalan/17/head   -> origin/gh/tugsbayasgalan/17/head
2025-12-04T09:17:18.7616080Z  * [new branch]              gh/tugsbayasgalan/17/orig   -> origin/gh/tugsbayasgalan/17/orig
2025-12-04T09:17:18.7616425Z  * [new branch]              gh/tugsbayasgalan/2/base    -> origin/gh/tugsbayasgalan/2/base
2025-12-04T09:17:18.7616679Z  * [new branch]              gh/tugsbayasgalan/2/head    -> origin/gh/tugsbayasgalan/2/head
2025-12-04T09:17:18.7617434Z  * [new branch]              gh/tugsbayasgalan/2/orig    -> origin/gh/tugsbayasgalan/2/orig
2025-12-04T09:17:18.7620856Z  * [new branch]              gh/tugsbayasgalan/28/base   -> origin/gh/tugsbayasgalan/28/base
2025-12-04T09:17:18.7622586Z  * [new branch]              gh/tugsbayasgalan/28/head   -> origin/gh/tugsbayasgalan/28/head
2025-12-04T09:17:18.7624397Z  * [new branch]              gh/tugsbayasgalan/28/orig   -> origin/gh/tugsbayasgalan/28/orig
2025-12-04T09:17:18.7626805Z  * [new branch]              gh/tugsbayasgalan/32/base   -> origin/gh/tugsbayasgalan/32/base
2025-12-04T09:17:18.7629293Z  * [new branch]              gh/tugsbayasgalan/32/head   -> origin/gh/tugsbayasgalan/32/head
2025-12-04T09:17:18.7631385Z  * [new branch]              gh/tugsbayasgalan/32/orig   -> origin/gh/tugsbayasgalan/32/orig
2025-12-04T09:17:18.7634094Z  * [new branch]              gh/tugsbayasgalan/35/base   -> origin/gh/tugsbayasgalan/35/base
2025-12-04T09:17:18.7635954Z  * [new branch]              gh/tugsbayasgalan/35/head   -> origin/gh/tugsbayasgalan/35/head
2025-12-04T09:17:18.7637767Z  * [new branch]              gh/tugsbayasgalan/35/orig   -> origin/gh/tugsbayasgalan/35/orig
2025-12-04T09:17:18.7640283Z  * [new branch]              gh/tugsbayasgalan/36/base   -> origin/gh/tugsbayasgalan/36/base
2025-12-04T09:17:18.7642113Z  * [new branch]              gh/tugsbayasgalan/36/head   -> origin/gh/tugsbayasgalan/36/head
2025-12-04T09:17:18.7643950Z  * [new branch]              gh/tugsbayasgalan/36/orig   -> origin/gh/tugsbayasgalan/36/orig
2025-12-04T09:17:18.7646535Z  * [new branch]              gh/tugsbayasgalan/37/base   -> origin/gh/tugsbayasgalan/37/base
2025-12-04T09:17:18.7648358Z  * [new branch]              gh/tugsbayasgalan/37/head   -> origin/gh/tugsbayasgalan/37/head
2025-12-04T09:17:18.7650151Z  * [new branch]              gh/tugsbayasgalan/37/orig   -> origin/gh/tugsbayasgalan/37/orig
2025-12-04T09:17:18.7652646Z  * [new branch]              gh/tugsbayasgalan/43/base   -> origin/gh/tugsbayasgalan/43/base
2025-12-04T09:17:18.7654474Z  * [new branch]              gh/tugsbayasgalan/43/head   -> origin/gh/tugsbayasgalan/43/head
2025-12-04T09:17:18.7656874Z  * [new branch]              gh/tugsbayasgalan/43/orig   -> origin/gh/tugsbayasgalan/43/orig
2025-12-04T09:17:18.7659380Z  * [new branch]              gh/tugsbayasgalan/48/base   -> origin/gh/tugsbayasgalan/48/base
2025-12-04T09:17:18.7661277Z  * [new branch]              gh/tugsbayasgalan/48/head   -> origin/gh/tugsbayasgalan/48/head
2025-12-04T09:17:18.7663040Z  * [new branch]              gh/tugsbayasgalan/48/orig   -> origin/gh/tugsbayasgalan/48/orig
2025-12-04T09:17:18.7665620Z  * [new branch]              gh/tugsbayasgalan/51/base   -> origin/gh/tugsbayasgalan/51/base
2025-12-04T09:17:18.7667622Z  * [new branch]              gh/tugsbayasgalan/51/head   -> origin/gh/tugsbayasgalan/51/head
2025-12-04T09:17:18.7669369Z  * [new branch]              gh/tugsbayasgalan/51/orig   -> origin/gh/tugsbayasgalan/51/orig
2025-12-04T09:17:18.7671693Z  * [new branch]              gh/tugsbayasgalan/52/base   -> origin/gh/tugsbayasgalan/52/base
2025-12-04T09:17:18.7673609Z  * [new branch]              gh/tugsbayasgalan/52/head   -> origin/gh/tugsbayasgalan/52/head
2025-12-04T09:17:18.7675455Z  * [new branch]              gh/tugsbayasgalan/52/orig   -> origin/gh/tugsbayasgalan/52/orig
2025-12-04T09:17:18.7677962Z  * [new branch]              gh/tugsbayasgalan/53/base   -> origin/gh/tugsbayasgalan/53/base
2025-12-04T09:17:18.7679778Z  * [new branch]              gh/tugsbayasgalan/53/head   -> origin/gh/tugsbayasgalan/53/head
2025-12-04T09:17:18.7682103Z  * [new branch]              gh/tugsbayasgalan/53/orig   -> origin/gh/tugsbayasgalan/53/orig
2025-12-04T09:17:18.7684871Z  * [new branch]              gh/tugsbayasgalan/55/base   -> origin/gh/tugsbayasgalan/55/base
2025-12-04T09:17:18.7686825Z  * [new branch]              gh/tugsbayasgalan/55/head   -> origin/gh/tugsbayasgalan/55/head
2025-12-04T09:17:18.7688734Z  * [new branch]              gh/tugsbayasgalan/55/orig   -> origin/gh/tugsbayasgalan/55/orig
2025-12-04T09:17:18.7691379Z  * [new branch]              gh/tugsbayasgalan/59/base   -> origin/gh/tugsbayasgalan/59/base
2025-12-04T09:17:18.7693339Z  * [new branch]              gh/tugsbayasgalan/59/head   -> origin/gh/tugsbayasgalan/59/head
2025-12-04T09:17:18.7695152Z  * [new branch]              gh/tugsbayasgalan/59/orig   -> origin/gh/tugsbayasgalan/59/orig
2025-12-04T09:17:18.7697577Z  * [new branch]              gh/tugsbayasgalan/6/base    -> origin/gh/tugsbayasgalan/6/base
2025-12-04T09:17:18.7699437Z  * [new branch]              gh/tugsbayasgalan/6/head    -> origin/gh/tugsbayasgalan/6/head
2025-12-04T09:17:18.7701355Z  * [new branch]              gh/tugsbayasgalan/6/orig    -> origin/gh/tugsbayasgalan/6/orig
2025-12-04T09:17:18.7703704Z  * [new branch]              gh/tugsbayasgalan/60/base   -> origin/gh/tugsbayasgalan/60/base
2025-12-04T09:17:18.7705553Z  * [new branch]              gh/tugsbayasgalan/60/head   -> origin/gh/tugsbayasgalan/60/head
2025-12-04T09:17:18.7707338Z  * [new branch]              gh/tugsbayasgalan/60/orig   -> origin/gh/tugsbayasgalan/60/orig
2025-12-04T09:17:18.7710886Z  * [new branch]              gh/tugsbayasgalan/61/base   -> origin/gh/tugsbayasgalan/61/base
2025-12-04T09:17:18.7712545Z  * [new branch]              gh/tugsbayasgalan/61/head   -> origin/gh/tugsbayasgalan/61/head
2025-12-04T09:17:18.7714885Z  * [new branch]              gh/tugsbayasgalan/61/orig   -> origin/gh/tugsbayasgalan/61/orig
2025-12-04T09:17:18.7717545Z  * [new branch]              gh/tugsbayasgalan/63/base   -> origin/gh/tugsbayasgalan/63/base
2025-12-04T09:17:18.7719327Z  * [new branch]              gh/tugsbayasgalan/63/head   -> origin/gh/tugsbayasgalan/63/head
2025-12-04T09:17:18.7721163Z  * [new branch]              gh/tugsbayasgalan/63/orig   -> origin/gh/tugsbayasgalan/63/orig
2025-12-04T09:17:18.7723793Z  * [new branch]              gh/tugsbayasgalan/67/base   -> origin/gh/tugsbayasgalan/67/base
2025-12-04T09:17:18.7725597Z  * [new branch]              gh/tugsbayasgalan/67/head   -> origin/gh/tugsbayasgalan/67/head
2025-12-04T09:17:18.7727435Z  * [new branch]              gh/tugsbayasgalan/67/orig   -> origin/gh/tugsbayasgalan/67/orig
2025-12-04T09:17:18.7730255Z  * [new branch]              gh/tugsbayasgalan/68/base   -> origin/gh/tugsbayasgalan/68/base
2025-12-04T09:17:18.7732168Z  * [new branch]              gh/tugsbayasgalan/68/head   -> origin/gh/tugsbayasgalan/68/head
2025-12-04T09:17:18.7733913Z  * [new branch]              gh/tugsbayasgalan/68/orig   -> origin/gh/tugsbayasgalan/68/orig
2025-12-04T09:17:18.7736641Z  * [new branch]              gh/tugsbayasgalan/7/base    -> origin/gh/tugsbayasgalan/7/base
2025-12-04T09:17:18.7738486Z  * [new branch]              gh/tugsbayasgalan/7/head    -> origin/gh/tugsbayasgalan/7/head
2025-12-04T09:17:18.7740608Z  * [new branch]              gh/tugsbayasgalan/7/orig    -> origin/gh/tugsbayasgalan/7/orig
2025-12-04T09:17:18.7743312Z  * [new branch]              gh/tugsbayasgalan/70/base   -> origin/gh/tugsbayasgalan/70/base
2025-12-04T09:17:18.7745282Z  * [new branch]              gh/tugsbayasgalan/70/head   -> origin/gh/tugsbayasgalan/70/head
2025-12-04T09:17:18.7747139Z  * [new branch]              gh/tugsbayasgalan/70/orig   -> origin/gh/tugsbayasgalan/70/orig
2025-12-04T09:17:18.7749915Z  * [new branch]              gh/tugsbayasgalan/71/base   -> origin/gh/tugsbayasgalan/71/base
2025-12-04T09:17:18.7751868Z  * [new branch]              gh/tugsbayasgalan/71/head   -> origin/gh/tugsbayasgalan/71/head
2025-12-04T09:17:18.7753765Z  * [new branch]              gh/tugsbayasgalan/71/orig   -> origin/gh/tugsbayasgalan/71/orig
2025-12-04T09:17:18.7756514Z  * [new branch]              gh/tugsbayasgalan/72/base   -> origin/gh/tugsbayasgalan/72/base
2025-12-04T09:17:18.7758383Z  * [new branch]              gh/tugsbayasgalan/72/head   -> origin/gh/tugsbayasgalan/72/head
2025-12-04T09:17:18.7760205Z  * [new branch]              gh/tugsbayasgalan/72/orig   -> origin/gh/tugsbayasgalan/72/orig
2025-12-04T09:17:18.7762892Z  * [new branch]              gh/tugsbayasgalan/73/base   -> origin/gh/tugsbayasgalan/73/base
2025-12-04T09:17:18.7764848Z  * [new branch]              gh/tugsbayasgalan/73/head   -> origin/gh/tugsbayasgalan/73/head
2025-12-04T09:17:18.7766667Z  * [new branch]              gh/tugsbayasgalan/73/orig   -> origin/gh/tugsbayasgalan/73/orig
2025-12-04T09:17:18.7769423Z  * [new branch]              gh/tugsbayasgalan/74/base   -> origin/gh/tugsbayasgalan/74/base
2025-12-04T09:17:18.7771345Z  * [new branch]              gh/tugsbayasgalan/74/head   -> origin/gh/tugsbayasgalan/74/head
2025-12-04T09:17:18.7773184Z  * [new branch]              gh/tugsbayasgalan/74/orig   -> origin/gh/tugsbayasgalan/74/orig
2025-12-04T09:17:18.7775851Z  * [new branch]              gh/tugsbayasgalan/75/base   -> origin/gh/tugsbayasgalan/75/base
2025-12-04T09:17:18.7777632Z  * [new branch]              gh/tugsbayasgalan/75/head   -> origin/gh/tugsbayasgalan/75/head
2025-12-04T09:17:18.7779471Z  * [new branch]              gh/tugsbayasgalan/75/orig   -> origin/gh/tugsbayasgalan/75/orig
2025-12-04T09:17:18.7782004Z  * [new branch]              gh/tugsbayasgalan/76/base   -> origin/gh/tugsbayasgalan/76/base
2025-12-04T09:17:18.7784032Z  * [new branch]              gh/tugsbayasgalan/76/head   -> origin/gh/tugsbayasgalan/76/head
2025-12-04T09:17:18.7786073Z  * [new branch]              gh/tugsbayasgalan/76/orig   -> origin/gh/tugsbayasgalan/76/orig
2025-12-04T09:17:18.7788856Z  * [new branch]              gh/tugsbayasgalan/77/base   -> origin/gh/tugsbayasgalan/77/base
2025-12-04T09:17:18.7790611Z  * [new branch]              gh/tugsbayasgalan/77/head   -> origin/gh/tugsbayasgalan/77/head
2025-12-04T09:17:18.7792415Z  * [new branch]              gh/tugsbayasgalan/77/orig   -> origin/gh/tugsbayasgalan/77/orig
2025-12-04T09:17:18.7795035Z  * [new branch]              gh/tugsbayasgalan/78/base   -> origin/gh/tugsbayasgalan/78/base
2025-12-04T09:17:18.7797101Z  * [new branch]              gh/tugsbayasgalan/78/head   -> origin/gh/tugsbayasgalan/78/head
2025-12-04T09:17:18.7798933Z  * [new branch]              gh/tugsbayasgalan/78/orig   -> origin/gh/tugsbayasgalan/78/orig
2025-12-04T09:17:18.7801526Z  * [new branch]              gh/tugsbayasgalan/79/base   -> origin/gh/tugsbayasgalan/79/base
2025-12-04T09:17:18.7803356Z  * [new branch]              gh/tugsbayasgalan/79/head   -> origin/gh/tugsbayasgalan/79/head
2025-12-04T09:17:18.7805188Z  * [new branch]              gh/tugsbayasgalan/79/orig   -> origin/gh/tugsbayasgalan/79/orig
2025-12-04T09:17:18.7807911Z  * [new branch]              gh/tugsbayasgalan/8/base    -> origin/gh/tugsbayasgalan/8/base
2025-12-04T09:17:18.7809788Z  * [new branch]              gh/tugsbayasgalan/8/head    -> origin/gh/tugsbayasgalan/8/head
2025-12-04T09:17:18.7811746Z  * [new branch]              gh/tugsbayasgalan/8/orig    -> origin/gh/tugsbayasgalan/8/orig
2025-12-04T09:17:18.7814233Z  * [new branch]              gh/tugsbayasgalan/80/base   -> origin/gh/tugsbayasgalan/80/base
2025-12-04T09:17:18.7815991Z  * [new branch]              gh/tugsbayasgalan/80/head   -> origin/gh/tugsbayasgalan/80/head
2025-12-04T09:17:18.7817972Z  * [new branch]              gh/tugsbayasgalan/80/orig   -> origin/gh/tugsbayasgalan/80/orig
2025-12-04T09:17:18.7820854Z  * [new branch]              gh/tugsbayasgalan/81/base   -> origin/gh/tugsbayasgalan/81/base
2025-12-04T09:17:18.7822580Z  * [new branch]              gh/tugsbayasgalan/81/head   -> origin/gh/tugsbayasgalan/81/head
2025-12-04T09:17:18.7824288Z  * [new branch]              gh/tugsbayasgalan/81/orig   -> origin/gh/tugsbayasgalan/81/orig
2025-12-04T09:17:18.7827441Z  * [new branch]              gh/tugsbayasgalan/82/base   -> origin/gh/tugsbayasgalan/82/base
2025-12-04T09:17:18.7829352Z  * [new branch]              gh/tugsbayasgalan/82/head   -> origin/gh/tugsbayasgalan/82/head
2025-12-04T09:17:18.7831261Z  * [new branch]              gh/tugsbayasgalan/82/orig   -> origin/gh/tugsbayasgalan/82/orig
2025-12-04T09:17:18.7833706Z  * [new branch]              gh/tugsbayasgalan/83/base   -> origin/gh/tugsbayasgalan/83/base
2025-12-04T09:17:18.7835548Z  * [new branch]              gh/tugsbayasgalan/83/head   -> origin/gh/tugsbayasgalan/83/head
2025-12-04T09:17:18.7837404Z  * [new branch]              gh/tugsbayasgalan/83/orig   -> origin/gh/tugsbayasgalan/83/orig
2025-12-04T09:17:18.7840320Z  * [new branch]              gh/tugsbayasgalan/84/base   -> origin/gh/tugsbayasgalan/84/base
2025-12-04T09:17:18.7842159Z  * [new branch]              gh/tugsbayasgalan/84/head   -> origin/gh/tugsbayasgalan/84/head
2025-12-04T09:17:18.7844033Z  * [new branch]              gh/tugsbayasgalan/84/orig   -> origin/gh/tugsbayasgalan/84/orig
2025-12-04T09:17:18.7847107Z  * [new branch]              gh/tugsbayasgalan/85/base   -> origin/gh/tugsbayasgalan/85/base
2025-12-04T09:17:18.7849030Z  * [new branch]              gh/tugsbayasgalan/85/head   -> origin/gh/tugsbayasgalan/85/head
2025-12-04T09:17:18.7850872Z  * [new branch]              gh/tugsbayasgalan/85/orig   -> origin/gh/tugsbayasgalan/85/orig
2025-12-04T09:17:18.7853470Z  * [new branch]              gh/tugsbayasgalan/86/base   -> origin/gh/tugsbayasgalan/86/base
2025-12-04T09:17:18.7855301Z  * [new branch]              gh/tugsbayasgalan/86/head   -> origin/gh/tugsbayasgalan/86/head
2025-12-04T09:17:18.7857123Z  * [new branch]              gh/tugsbayasgalan/86/orig   -> origin/gh/tugsbayasgalan/86/orig
2025-12-04T09:17:18.7860395Z  * [new branch]              gh/tugsbayasgalan/87/base   -> origin/gh/tugsbayasgalan/87/base
2025-12-04T09:17:18.7862076Z  * [new branch]              gh/tugsbayasgalan/87/head   -> origin/gh/tugsbayasgalan/87/head
2025-12-04T09:17:18.7863857Z  * [new branch]              gh/tugsbayasgalan/87/orig   -> origin/gh/tugsbayasgalan/87/orig
2025-12-04T09:17:18.7866604Z  * [new branch]              gh/tugsbayasgalan/88/base   -> origin/gh/tugsbayasgalan/88/base
2025-12-04T09:17:18.7868404Z  * [new branch]              gh/tugsbayasgalan/88/head   -> origin/gh/tugsbayasgalan/88/head
2025-12-04T09:17:18.7870234Z  * [new branch]              gh/tugsbayasgalan/88/orig   -> origin/gh/tugsbayasgalan/88/orig
2025-12-04T09:17:18.7872852Z  * [new branch]              gh/tugsbayasgalan/89/base   -> origin/gh/tugsbayasgalan/89/base
2025-12-04T09:17:18.7874739Z  * [new branch]              gh/tugsbayasgalan/89/head   -> origin/gh/tugsbayasgalan/89/head
2025-12-04T09:17:18.7876476Z  * [new branch]              gh/tugsbayasgalan/89/orig   -> origin/gh/tugsbayasgalan/89/orig
2025-12-04T09:17:18.7879033Z  * [new branch]              gh/tugsbayasgalan/9/base    -> origin/gh/tugsbayasgalan/9/base
2025-12-04T09:17:18.7880742Z  * [new branch]              gh/tugsbayasgalan/9/head    -> origin/gh/tugsbayasgalan/9/head
2025-12-04T09:17:18.7882620Z  * [new branch]              gh/tugsbayasgalan/9/orig    -> origin/gh/tugsbayasgalan/9/orig
2025-12-04T09:17:18.7885625Z  * [new branch]              gh/tugsbayasgalan/90/base   -> origin/gh/tugsbayasgalan/90/base
2025-12-04T09:17:18.7887725Z  * [new branch]              gh/tugsbayasgalan/90/head   -> origin/gh/tugsbayasgalan/90/head
2025-12-04T09:17:18.7889537Z  * [new branch]              gh/tugsbayasgalan/90/orig   -> origin/gh/tugsbayasgalan/90/orig
2025-12-04T09:17:18.7892394Z  * [new branch]              gh/tugsbayasgalan/91/base   -> origin/gh/tugsbayasgalan/91/base
2025-12-04T09:17:18.7894147Z  * [new branch]              gh/tugsbayasgalan/91/head   -> origin/gh/tugsbayasgalan/91/head
2025-12-04T09:17:18.7896000Z  * [new branch]              gh/tugsbayasgalan/91/orig   -> origin/gh/tugsbayasgalan/91/orig
2025-12-04T09:17:18.7898656Z  * [new branch]              gh/tugsbayasgalan/92/base   -> origin/gh/tugsbayasgalan/92/base
2025-12-04T09:17:18.7900659Z  * [new branch]              gh/tugsbayasgalan/92/head   -> origin/gh/tugsbayasgalan/92/head
2025-12-04T09:17:18.7902489Z  * [new branch]              gh/tugsbayasgalan/92/orig   -> origin/gh/tugsbayasgalan/92/orig
2025-12-04T09:17:18.7905313Z  * [new branch]              gh/tugsbayasgalan/93/base   -> origin/gh/tugsbayasgalan/93/base
2025-12-04T09:17:18.7907196Z  * [new branch]              gh/tugsbayasgalan/93/head   -> origin/gh/tugsbayasgalan/93/head
2025-12-04T09:17:18.7909325Z  * [new branch]              gh/tugsbayasgalan/93/orig   -> origin/gh/tugsbayasgalan/93/orig
2025-12-04T09:17:18.7912358Z  * [new branch]              gh/v0i0/14/base             -> origin/gh/v0i0/14/base
2025-12-04T09:17:18.7914098Z  * [new branch]              gh/v0i0/14/head             -> origin/gh/v0i0/14/head
2025-12-04T09:17:18.7915896Z  * [new branch]              gh/v0i0/14/orig             -> origin/gh/v0i0/14/orig
2025-12-04T09:17:18.7918308Z  * [new branch]              gh/v0i0/15/base             -> origin/gh/v0i0/15/base
2025-12-04T09:17:18.7920214Z  * [new branch]              gh/v0i0/15/head             -> origin/gh/v0i0/15/head
2025-12-04T09:17:18.7922090Z  * [new branch]              gh/v0i0/15/orig             -> origin/gh/v0i0/15/orig
2025-12-04T09:17:18.7924640Z  * [new branch]              gh/v0i0/16/base             -> origin/gh/v0i0/16/base
2025-12-04T09:17:18.7926457Z  * [new branch]              gh/v0i0/16/head             -> origin/gh/v0i0/16/head
2025-12-04T09:17:18.7928255Z  * [new branch]              gh/v0i0/16/orig             -> origin/gh/v0i0/16/orig
2025-12-04T09:17:18.7930712Z  * [new branch]              gh/v0i0/17/base             -> origin/gh/v0i0/17/base
2025-12-04T09:17:18.7932546Z  * [new branch]              gh/v0i0/17/head             -> origin/gh/v0i0/17/head
2025-12-04T09:17:18.7934340Z  * [new branch]              gh/v0i0/17/orig             -> origin/gh/v0i0/17/orig
2025-12-04T09:17:18.7936904Z  * [new branch]              gh/v0i0/18/base             -> origin/gh/v0i0/18/base
2025-12-04T09:17:18.7938814Z  * [new branch]              gh/v0i0/18/head             -> origin/gh/v0i0/18/head
2025-12-04T09:17:18.7941315Z  * [new branch]              gh/v0i0/18/orig             -> origin/gh/v0i0/18/orig
2025-12-04T09:17:18.7943903Z  * [new branch]              gh/v0i0/19/base             -> origin/gh/v0i0/19/base
2025-12-04T09:17:18.7945680Z  * [new branch]              gh/v0i0/19/head             -> origin/gh/v0i0/19/head
2025-12-04T09:17:18.7947534Z  * [new branch]              gh/v0i0/19/orig             -> origin/gh/v0i0/19/orig
2025-12-04T09:17:18.7950654Z  * [new branch]              gh/vishal9-team/1/base      -> origin/gh/vishal9-team/1/base
2025-12-04T09:17:18.7952498Z  * [new branch]              gh/vishal9-team/1/head      -> origin/gh/vishal9-team/1/head
2025-12-04T09:17:18.7954893Z  * [new branch]              gh/vishal9-team/2/base      -> origin/gh/vishal9-team/2/base
2025-12-04T09:17:18.7956734Z  * [new branch]              gh/vishal9-team/2/head      -> origin/gh/vishal9-team/2/head
2025-12-04T09:17:18.7958613Z  * [new branch]              gh/vishal9-team/2/orig      -> origin/gh/vishal9-team/2/orig
2025-12-04T09:17:18.7961278Z  * [new branch]              gh/vishal9-team/3/base      -> origin/gh/vishal9-team/3/base
2025-12-04T09:17:18.7963010Z  * [new branch]              gh/vishal9-team/3/head      -> origin/gh/vishal9-team/3/head
2025-12-04T09:17:18.7964892Z  * [new branch]              gh/vishal9-team/3/orig      -> origin/gh/vishal9-team/3/orig
2025-12-04T09:17:18.7967883Z  * [new branch]              gh/vishal9-team/4/base      -> origin/gh/vishal9-team/4/base
2025-12-04T09:17:18.7969727Z  * [new branch]              gh/vishal9-team/4/head      -> origin/gh/vishal9-team/4/head
2025-12-04T09:17:18.7971525Z  * [new branch]              gh/vishal9-team/4/orig      -> origin/gh/vishal9-team/4/orig
2025-12-04T09:17:18.7974510Z  * [new branch]              gh/vkuzo/1/next             -> origin/gh/vkuzo/1/next
2025-12-04T09:17:18.7977041Z  * [new branch]              gh/vkuzo/2/next             -> origin/gh/vkuzo/2/next
2025-12-04T09:17:18.7979583Z  * [new branch]              gh/vkuzo/3/next             -> origin/gh/vkuzo/3/next
2025-12-04T09:17:18.7982715Z  * [new branch]              gh/wconstab/424/base        -> origin/gh/wconstab/424/base
2025-12-04T09:17:18.7984548Z  * [new branch]              gh/wconstab/424/head        -> origin/gh/wconstab/424/head
2025-12-04T09:17:18.7986513Z  * [new branch]              gh/wconstab/424/orig        -> origin/gh/wconstab/424/orig
2025-12-04T09:17:18.7989116Z  * [new branch]              gh/wconstab/435/base        -> origin/gh/wconstab/435/base
2025-12-04T09:17:18.7991011Z  * [new branch]              gh/wconstab/435/head        -> origin/gh/wconstab/435/head
2025-12-04T09:17:18.7992846Z  * [new branch]              gh/wconstab/435/orig        -> origin/gh/wconstab/435/orig
2025-12-04T09:17:18.7995362Z  * [new branch]              gh/wconstab/444/base        -> origin/gh/wconstab/444/base
2025-12-04T09:17:18.7997629Z  * [new branch]              gh/wconstab/444/head        -> origin/gh/wconstab/444/head
2025-12-04T09:17:18.7999519Z  * [new branch]              gh/wconstab/444/orig        -> origin/gh/wconstab/444/orig
2025-12-04T09:17:18.8002062Z  * [new branch]              gh/wconstab/447/base        -> origin/gh/wconstab/447/base
2025-12-04T09:17:18.8003845Z  * [new branch]              gh/wconstab/447/head        -> origin/gh/wconstab/447/head
2025-12-04T09:17:18.8005698Z  * [new branch]              gh/wconstab/447/orig        -> origin/gh/wconstab/447/orig
2025-12-04T09:17:18.8008388Z  * [new branch]              gh/wconstab/448/base        -> origin/gh/wconstab/448/base
2025-12-04T09:17:18.8010230Z  * [new branch]              gh/wconstab/448/head        -> origin/gh/wconstab/448/head
2025-12-04T09:17:18.8012010Z  * [new branch]              gh/wconstab/448/orig        -> origin/gh/wconstab/448/orig
2025-12-04T09:17:18.8014490Z  * [new branch]              gh/wconstab/449/base        -> origin/gh/wconstab/449/base
2025-12-04T09:17:18.8016327Z  * [new branch]              gh/wconstab/449/head        -> origin/gh/wconstab/449/head
2025-12-04T09:17:18.8019024Z  * [new branch]              gh/wconstab/449/orig        -> origin/gh/wconstab/449/orig
2025-12-04T09:17:18.8021614Z  * [new branch]              gh/wconstab/450/base        -> origin/gh/wconstab/450/base
2025-12-04T09:17:18.8024051Z  * [new branch]              gh/wconstab/450/head        -> origin/gh/wconstab/450/head
2025-12-04T09:17:18.8025893Z  * [new branch]              gh/wconstab/450/orig        -> origin/gh/wconstab/450/orig
2025-12-04T09:17:18.8028249Z  * [new branch]              gh/wconstab/451/base        -> origin/gh/wconstab/451/base
2025-12-04T09:17:18.8030140Z  * [new branch]              gh/wconstab/451/head        -> origin/gh/wconstab/451/head
2025-12-04T09:17:18.8031930Z  * [new branch]              gh/wconstab/451/orig        -> origin/gh/wconstab/451/orig
2025-12-04T09:17:18.8034470Z  * [new branch]              gh/wconstab/452/base        -> origin/gh/wconstab/452/base
2025-12-04T09:17:18.8036215Z  * [new branch]              gh/wconstab/452/head        -> origin/gh/wconstab/452/head
2025-12-04T09:17:18.8038267Z  * [new branch]              gh/wconstab/452/orig        -> origin/gh/wconstab/452/orig
2025-12-04T09:17:18.8040566Z  * [new branch]              gh/wconstab/453/base        -> origin/gh/wconstab/453/base
2025-12-04T09:17:18.8042514Z  * [new branch]              gh/wconstab/453/head        -> origin/gh/wconstab/453/head
2025-12-04T09:17:18.8044860Z  * [new branch]              gh/wconstab/453/orig        -> origin/gh/wconstab/453/orig
2025-12-04T09:17:18.8047260Z  * [new branch]              gh/wconstab/454/base        -> origin/gh/wconstab/454/base
2025-12-04T09:17:18.8049150Z  * [new branch]              gh/wconstab/454/head        -> origin/gh/wconstab/454/head
2025-12-04T09:17:18.8050940Z  * [new branch]              gh/wconstab/454/orig        -> origin/gh/wconstab/454/orig
2025-12-04T09:17:18.8053487Z  * [new branch]              gh/wconstab/455/base        -> origin/gh/wconstab/455/base
2025-12-04T09:17:18.8055335Z  * [new branch]              gh/wconstab/455/head        -> origin/gh/wconstab/455/head
2025-12-04T09:17:18.8057204Z  * [new branch]              gh/wconstab/455/orig        -> origin/gh/wconstab/455/orig
2025-12-04T09:17:18.8060064Z  * [new branch]              gh/wconstab/456/base        -> origin/gh/wconstab/456/base
2025-12-04T09:17:18.8062140Z  * [new branch]              gh/wconstab/456/head        -> origin/gh/wconstab/456/head
2025-12-04T09:17:18.8064031Z  * [new branch]              gh/wconstab/456/orig        -> origin/gh/wconstab/456/orig
2025-12-04T09:17:18.8066633Z  * [new branch]              gh/wconstab/457/base        -> origin/gh/wconstab/457/base
2025-12-04T09:17:18.8068686Z  * [new branch]              gh/wconstab/457/head        -> origin/gh/wconstab/457/head
2025-12-04T09:17:18.8070719Z  * [new branch]              gh/wconstab/457/orig        -> origin/gh/wconstab/457/orig
2025-12-04T09:17:18.8073267Z  * [new branch]              gh/wconstab/458/base        -> origin/gh/wconstab/458/base
2025-12-04T09:17:18.8075106Z  * [new branch]              gh/wconstab/458/head        -> origin/gh/wconstab/458/head
2025-12-04T09:17:18.8076926Z  * [new branch]              gh/wconstab/458/orig        -> origin/gh/wconstab/458/orig
2025-12-04T09:17:18.8079388Z  * [new branch]              gh/wconstab/459/base        -> origin/gh/wconstab/459/base
2025-12-04T09:17:18.8081337Z  * [new branch]              gh/wconstab/459/head        -> origin/gh/wconstab/459/head
2025-12-04T09:17:18.8083087Z  * [new branch]              gh/wconstab/459/orig        -> origin/gh/wconstab/459/orig
2025-12-04T09:17:18.8086382Z  * [new branch]              gh/wconstab/460/base        -> origin/gh/wconstab/460/base
2025-12-04T09:17:18.8088452Z  * [new branch]              gh/wconstab/460/head        -> origin/gh/wconstab/460/head
2025-12-04T09:17:18.8090362Z  * [new branch]              gh/wconstab/460/orig        -> origin/gh/wconstab/460/orig
2025-12-04T09:17:18.8093105Z  * [new branch]              gh/wconstab/461/base        -> origin/gh/wconstab/461/base
2025-12-04T09:17:18.8094994Z  * [new branch]              gh/wconstab/461/head        -> origin/gh/wconstab/461/head
2025-12-04T09:17:18.8096847Z  * [new branch]              gh/wconstab/461/orig        -> origin/gh/wconstab/461/orig
2025-12-04T09:17:18.8099316Z  * [new branch]              gh/wconstab/462/base        -> origin/gh/wconstab/462/base
2025-12-04T09:17:18.8101273Z  * [new branch]              gh/wconstab/462/head        -> origin/gh/wconstab/462/head
2025-12-04T09:17:18.8103170Z  * [new branch]              gh/wconstab/462/orig        -> origin/gh/wconstab/462/orig
2025-12-04T09:17:18.8105882Z  * [new branch]              gh/wconstab/463/base        -> origin/gh/wconstab/463/base
2025-12-04T09:17:18.8107929Z  * [new branch]              gh/wconstab/463/head        -> origin/gh/wconstab/463/head
2025-12-04T09:17:18.8109937Z  * [new branch]              gh/wconstab/463/orig        -> origin/gh/wconstab/463/orig
2025-12-04T09:17:18.8112435Z  * [new branch]              gh/wconstab/464/base        -> origin/gh/wconstab/464/base
2025-12-04T09:17:18.8114513Z  * [new branch]              gh/wconstab/464/head        -> origin/gh/wconstab/464/head
2025-12-04T09:17:18.8116222Z  * [new branch]              gh/wconstab/464/orig        -> origin/gh/wconstab/464/orig
2025-12-04T09:17:18.8118818Z  * [new branch]              gh/wconstab/465/base        -> origin/gh/wconstab/465/base
2025-12-04T09:17:18.8120736Z  * [new branch]              gh/wconstab/465/head        -> origin/gh/wconstab/465/head
2025-12-04T09:17:18.8122488Z  * [new branch]              gh/wconstab/465/orig        -> origin/gh/wconstab/465/orig
2025-12-04T09:17:18.8125152Z  * [new branch]              gh/wconstab/466/base        -> origin/gh/wconstab/466/base
2025-12-04T09:17:18.8126846Z  * [new branch]              gh/wconstab/466/head        -> origin/gh/wconstab/466/head
2025-12-04T09:17:18.8129048Z  * [new branch]              gh/wconstab/466/orig        -> origin/gh/wconstab/466/orig
2025-12-04T09:17:18.8132106Z  * [new branch]              gh/wconstab/467/base        -> origin/gh/wconstab/467/base
2025-12-04T09:17:18.8134022Z  * [new branch]              gh/wconstab/467/head        -> origin/gh/wconstab/467/head
2025-12-04T09:17:18.8135852Z  * [new branch]              gh/wconstab/467/orig        -> origin/gh/wconstab/467/orig
2025-12-04T09:17:18.8138300Z  * [new branch]              gh/wconstab/468/base        -> origin/gh/wconstab/468/base
2025-12-04T09:17:18.8140558Z  * [new branch]              gh/wconstab/468/head        -> origin/gh/wconstab/468/head
2025-12-04T09:17:18.8142172Z  * [new branch]              gh/wconstab/468/orig        -> origin/gh/wconstab/468/orig
2025-12-04T09:17:18.8145334Z  * [new branch]              gh/weifengpy/39/base        -> origin/gh/weifengpy/39/base
2025-12-04T09:17:18.8147090Z  * [new branch]              gh/weifengpy/39/head        -> origin/gh/weifengpy/39/head
2025-12-04T09:17:18.8149057Z  * [new branch]              gh/weifengpy/39/orig        -> origin/gh/weifengpy/39/orig
2025-12-04T09:17:18.8151631Z  * [new branch]              gh/weifengpy/40/base        -> origin/gh/weifengpy/40/base
2025-12-04T09:17:18.8153456Z  * [new branch]              gh/weifengpy/40/head        -> origin/gh/weifengpy/40/head
2025-12-04T09:17:18.8155828Z  * [new branch]              gh/weifengpy/40/orig        -> origin/gh/weifengpy/40/orig
2025-12-04T09:17:18.8158410Z  * [new branch]              gh/weifengpy/41/base        -> origin/gh/weifengpy/41/base
2025-12-04T09:17:18.8160314Z  * [new branch]              gh/weifengpy/41/head        -> origin/gh/weifengpy/41/head
2025-12-04T09:17:18.8162339Z  * [new branch]              gh/weifengpy/41/orig        -> origin/gh/weifengpy/41/orig
2025-12-04T09:17:18.8165472Z  * [new branch]              gh/williamwen42/250/base    -> origin/gh/williamwen42/250/base
2025-12-04T09:17:18.8167471Z  * [new branch]              gh/williamwen42/250/head    -> origin/gh/williamwen42/250/head
2025-12-04T09:17:18.8169292Z  * [new branch]              gh/williamwen42/250/orig    -> origin/gh/williamwen42/250/orig
2025-12-04T09:17:18.8171883Z  * [new branch]              gh/williamwen42/279/base    -> origin/gh/williamwen42/279/base
2025-12-04T09:17:18.8174064Z  * [new branch]              gh/williamwen42/279/head    -> origin/gh/williamwen42/279/head
2025-12-04T09:17:18.8175879Z  * [new branch]              gh/williamwen42/279/orig    -> origin/gh/williamwen42/279/orig
2025-12-04T09:17:18.8179126Z  * [new branch]              gh/williamwen42/282/base    -> origin/gh/williamwen42/282/base
2025-12-04T09:17:18.8181079Z  * [new branch]              gh/williamwen42/282/head    -> origin/gh/williamwen42/282/head
2025-12-04T09:17:18.8182903Z  * [new branch]              gh/williamwen42/282/orig    -> origin/gh/williamwen42/282/orig
2025-12-04T09:17:18.8185429Z  * [new branch]              gh/williamwen42/287/base    -> origin/gh/williamwen42/287/base
2025-12-04T09:17:18.8187304Z  * [new branch]              gh/williamwen42/287/head    -> origin/gh/williamwen42/287/head
2025-12-04T09:17:18.8189172Z  * [new branch]              gh/williamwen42/287/orig    -> origin/gh/williamwen42/287/orig
2025-12-04T09:17:18.8191826Z  * [new branch]              gh/williamwen42/288/base    -> origin/gh/williamwen42/288/base
2025-12-04T09:17:18.8193654Z  * [new branch]              gh/williamwen42/288/head    -> origin/gh/williamwen42/288/head
2025-12-04T09:17:18.8195497Z  * [new branch]              gh/williamwen42/288/orig    -> origin/gh/williamwen42/288/orig
2025-12-04T09:17:18.8198218Z  * [new branch]              gh/williamwen42/296/base    -> origin/gh/williamwen42/296/base
2025-12-04T09:17:18.8200258Z  * [new branch]              gh/williamwen42/296/head    -> origin/gh/williamwen42/296/head
2025-12-04T09:17:18.8202132Z  * [new branch]              gh/williamwen42/296/orig    -> origin/gh/williamwen42/296/orig
2025-12-04T09:17:18.8204571Z  * [new branch]              gh/williamwen42/297/base    -> origin/gh/williamwen42/297/base
2025-12-04T09:17:18.8206527Z  * [new branch]              gh/williamwen42/297/head    -> origin/gh/williamwen42/297/head
2025-12-04T09:17:18.8208681Z  * [new branch]              gh/williamwen42/297/orig    -> origin/gh/williamwen42/297/orig
2025-12-04T09:17:18.8214502Z  * [new branch]              gh/williamwen42/306/base    -> origin/gh/williamwen42/306/base
2025-12-04T09:17:18.8216863Z  * [new branch]              gh/williamwen42/306/head    -> origin/gh/williamwen42/306/head
2025-12-04T09:17:18.8218695Z  * [new branch]              gh/williamwen42/306/orig    -> origin/gh/williamwen42/306/orig
2025-12-04T09:17:18.8221454Z  * [new branch]              gh/williamwen42/309/base    -> origin/gh/williamwen42/309/base
2025-12-04T09:17:18.8223433Z  * [new branch]              gh/williamwen42/309/head    -> origin/gh/williamwen42/309/head
2025-12-04T09:17:18.8225317Z  * [new branch]              gh/williamwen42/309/orig    -> origin/gh/williamwen42/309/orig
2025-12-04T09:17:18.8227747Z  * [new branch]              gh/williamwen42/310/base    -> origin/gh/williamwen42/310/base
2025-12-04T09:17:18.8229573Z  * [new branch]              gh/williamwen42/310/head    -> origin/gh/williamwen42/310/head
2025-12-04T09:17:18.8231440Z  * [new branch]              gh/williamwen42/310/orig    -> origin/gh/williamwen42/310/orig
2025-12-04T09:17:18.8235086Z  * [new branch]              gh/williamwen42/311/base    -> origin/gh/williamwen42/311/base
2025-12-04T09:17:18.8236903Z  * [new branch]              gh/williamwen42/311/head    -> origin/gh/williamwen42/311/head
2025-12-04T09:17:18.8238720Z  * [new branch]              gh/williamwen42/311/orig    -> origin/gh/williamwen42/311/orig
2025-12-04T09:17:18.8241096Z  * [new branch]              gh/williamwen42/319/base    -> origin/gh/williamwen42/319/base
2025-12-04T09:17:18.8243095Z  * [new branch]              gh/williamwen42/319/head    -> origin/gh/williamwen42/319/head
2025-12-04T09:17:18.8245925Z  * [new branch]              gh/williamwen42/319/orig    -> origin/gh/williamwen42/319/orig
2025-12-04T09:17:18.8248317Z  * [new branch]              gh/williamwen42/325/base    -> origin/gh/williamwen42/325/base
2025-12-04T09:17:18.8249868Z  * [new branch]              gh/williamwen42/325/head    -> origin/gh/williamwen42/325/head
2025-12-04T09:17:18.8251710Z  * [new branch]              gh/williamwen42/325/orig    -> origin/gh/williamwen42/325/orig
2025-12-04T09:17:18.8254226Z  * [new branch]              gh/williamwen42/326/base    -> origin/gh/williamwen42/326/base
2025-12-04T09:17:18.8256231Z  * [new branch]              gh/williamwen42/326/head    -> origin/gh/williamwen42/326/head
2025-12-04T09:17:18.8258031Z  * [new branch]              gh/williamwen42/326/orig    -> origin/gh/williamwen42/326/orig
2025-12-04T09:17:18.8260811Z  * [new branch]              gh/williamwen42/327/base    -> origin/gh/williamwen42/327/base
2025-12-04T09:17:18.8262639Z  * [new branch]              gh/williamwen42/327/head    -> origin/gh/williamwen42/327/head
2025-12-04T09:17:18.8264422Z  * [new branch]              gh/williamwen42/327/orig    -> origin/gh/williamwen42/327/orig
2025-12-04T09:17:18.8267028Z  * [new branch]              gh/williamwen42/328/base    -> origin/gh/williamwen42/328/base
2025-12-04T09:17:18.8269068Z  * [new branch]              gh/williamwen42/328/head    -> origin/gh/williamwen42/328/head
2025-12-04T09:17:18.8270763Z  * [new branch]              gh/williamwen42/328/orig    -> origin/gh/williamwen42/328/orig
2025-12-04T09:17:18.8273885Z  * [new branch]              gh/williamwen42/329/base    -> origin/gh/williamwen42/329/base
2025-12-04T09:17:18.8276019Z  * [new branch]              gh/williamwen42/329/head    -> origin/gh/williamwen42/329/head
2025-12-04T09:17:18.8277973Z  * [new branch]              gh/williamwen42/329/orig    -> origin/gh/williamwen42/329/orig
2025-12-04T09:17:18.8280531Z  * [new branch]              gh/williamwen42/330/base    -> origin/gh/williamwen42/330/base
2025-12-04T09:17:18.8282371Z  * [new branch]              gh/williamwen42/330/head    -> origin/gh/williamwen42/330/head
2025-12-04T09:17:18.8284190Z  * [new branch]              gh/williamwen42/330/orig    -> origin/gh/williamwen42/330/orig
2025-12-04T09:17:18.8286815Z  * [new branch]              gh/williamwen42/331/base    -> origin/gh/williamwen42/331/base
2025-12-04T09:17:18.8288652Z  * [new branch]              gh/williamwen42/331/head    -> origin/gh/williamwen42/331/head
2025-12-04T09:17:18.8290496Z  * [new branch]              gh/williamwen42/331/orig    -> origin/gh/williamwen42/331/orig
2025-12-04T09:17:18.8293010Z  * [new branch]              gh/williamwen42/332/base    -> origin/gh/williamwen42/332/base
2025-12-04T09:17:18.8294969Z  * [new branch]              gh/williamwen42/332/head    -> origin/gh/williamwen42/332/head
2025-12-04T09:17:18.8296793Z  * [new branch]              gh/williamwen42/332/orig    -> origin/gh/williamwen42/332/orig
2025-12-04T09:17:18.8299569Z  * [new branch]              gh/williamwen42/333/base    -> origin/gh/williamwen42/333/base
2025-12-04T09:17:18.8301553Z  * [new branch]              gh/williamwen42/333/head    -> origin/gh/williamwen42/333/head
2025-12-04T09:17:18.8303422Z  * [new branch]              gh/williamwen42/333/orig    -> origin/gh/williamwen42/333/orig
2025-12-04T09:17:18.8306037Z  * [new branch]              gh/williamwen42/334/base    -> origin/gh/williamwen42/334/base
2025-12-04T09:17:18.8308003Z  * [new branch]              gh/williamwen42/334/head    -> origin/gh/williamwen42/334/head
2025-12-04T09:17:18.8310054Z  * [new branch]              gh/williamwen42/334/orig    -> origin/gh/williamwen42/334/orig
2025-12-04T09:17:18.8315971Z  * [new branch]              gh/williamwen42/335/base    -> origin/gh/williamwen42/335/base
2025-12-04T09:17:18.8318308Z  * [new branch]              gh/williamwen42/335/head    -> origin/gh/williamwen42/335/head
2025-12-04T09:17:18.8320382Z  * [new branch]              gh/williamwen42/335/orig    -> origin/gh/williamwen42/335/orig
2025-12-04T09:17:18.8323139Z  * [new branch]              gh/williamwen42/336/base    -> origin/gh/williamwen42/336/base
2025-12-04T09:17:18.8324930Z  * [new branch]              gh/williamwen42/336/head    -> origin/gh/williamwen42/336/head
2025-12-04T09:17:18.8326679Z  * [new branch]              gh/williamwen42/336/orig    -> origin/gh/williamwen42/336/orig
2025-12-04T09:17:18.8329255Z  * [new branch]              gh/williamwen42/337/base    -> origin/gh/williamwen42/337/base
2025-12-04T09:17:18.8331216Z  * [new branch]              gh/williamwen42/337/head    -> origin/gh/williamwen42/337/head
2025-12-04T09:17:18.8333030Z  * [new branch]              gh/williamwen42/337/orig    -> origin/gh/williamwen42/337/orig
2025-12-04T09:17:18.8335702Z  * [new branch]              gh/williamwen42/338/base    -> origin/gh/williamwen42/338/base
2025-12-04T09:17:18.8337547Z  * [new branch]              gh/williamwen42/338/head    -> origin/gh/williamwen42/338/head
2025-12-04T09:17:18.8339429Z  * [new branch]              gh/williamwen42/338/orig    -> origin/gh/williamwen42/338/orig
2025-12-04T09:17:18.8342105Z  * [new branch]              gh/williamwen42/339/base    -> origin/gh/williamwen42/339/base
2025-12-04T09:17:18.8344045Z  * [new branch]              gh/williamwen42/339/head    -> origin/gh/williamwen42/339/head
2025-12-04T09:17:18.8345715Z  * [new branch]              gh/williamwen42/339/orig    -> origin/gh/williamwen42/339/orig
2025-12-04T09:17:18.8348389Z  * [new branch]              gh/williamwen42/340/base    -> origin/gh/williamwen42/340/base
2025-12-04T09:17:18.8350148Z  * [new branch]              gh/williamwen42/340/head    -> origin/gh/williamwen42/340/head
2025-12-04T09:17:18.8351977Z  * [new branch]              gh/williamwen42/340/orig    -> origin/gh/williamwen42/340/orig
2025-12-04T09:17:18.8354626Z  * [new branch]              gh/williamwen42/341/base    -> origin/gh/williamwen42/341/base
2025-12-04T09:17:18.8356598Z  * [new branch]              gh/williamwen42/341/head    -> origin/gh/williamwen42/341/head
2025-12-04T09:17:18.8358413Z  * [new branch]              gh/williamwen42/341/orig    -> origin/gh/williamwen42/341/orig
2025-12-04T09:17:18.8360998Z  * [new branch]              gh/williamwen42/342/base    -> origin/gh/williamwen42/342/base
2025-12-04T09:17:18.8362826Z  * [new branch]              gh/williamwen42/342/head    -> origin/gh/williamwen42/342/head
2025-12-04T09:17:18.8364650Z  * [new branch]              gh/williamwen42/342/orig    -> origin/gh/williamwen42/342/orig
2025-12-04T09:17:18.8367286Z  * [new branch]              gh/williamwen42/343/base    -> origin/gh/williamwen42/343/base
2025-12-04T09:17:18.8369158Z  * [new branch]              gh/williamwen42/343/head    -> origin/gh/williamwen42/343/head
2025-12-04T09:17:18.8370962Z  * [new branch]              gh/williamwen42/343/orig    -> origin/gh/williamwen42/343/orig
2025-12-04T09:17:18.8373599Z  * [new branch]              gh/williamwen42/344/base    -> origin/gh/williamwen42/344/base
2025-12-04T09:17:18.8375403Z  * [new branch]              gh/williamwen42/344/head    -> origin/gh/williamwen42/344/head
2025-12-04T09:17:18.8377206Z  * [new branch]              gh/williamwen42/344/orig    -> origin/gh/williamwen42/344/orig
2025-12-04T09:17:18.8379955Z  * [new branch]              gh/williamwen42/345/base    -> origin/gh/williamwen42/345/base
2025-12-04T09:17:18.8381925Z  * [new branch]              gh/williamwen42/345/head    -> origin/gh/williamwen42/345/head
2025-12-04T09:17:18.8383721Z  * [new branch]              gh/williamwen42/345/orig    -> origin/gh/williamwen42/345/orig
2025-12-04T09:17:18.8386302Z  * [new branch]              gh/williamwen42/346/base    -> origin/gh/williamwen42/346/base
2025-12-04T09:17:18.8388242Z  * [new branch]              gh/williamwen42/346/head    -> origin/gh/williamwen42/346/head
2025-12-04T09:17:18.8390224Z  * [new branch]              gh/williamwen42/346/orig    -> origin/gh/williamwen42/346/orig
2025-12-04T09:17:18.8392810Z  * [new branch]              gh/williamwen42/347/base    -> origin/gh/williamwen42/347/base
2025-12-04T09:17:18.8394575Z  * [new branch]              gh/williamwen42/347/head    -> origin/gh/williamwen42/347/head
2025-12-04T09:17:18.8396390Z  * [new branch]              gh/williamwen42/347/orig    -> origin/gh/williamwen42/347/orig
2025-12-04T09:17:18.8398951Z  * [new branch]              gh/williamwen42/348/base    -> origin/gh/williamwen42/348/base
2025-12-04T09:17:18.8400692Z  * [new branch]              gh/williamwen42/348/head    -> origin/gh/williamwen42/348/head
2025-12-04T09:17:18.8402504Z  * [new branch]              gh/williamwen42/348/orig    -> origin/gh/williamwen42/348/orig
2025-12-04T09:17:18.8404939Z  * [new branch]              gh/williamwen42/349/base    -> origin/gh/williamwen42/349/base
2025-12-04T09:17:18.8406879Z  * [new branch]              gh/williamwen42/349/head    -> origin/gh/williamwen42/349/head
2025-12-04T09:17:18.8408849Z  * [new branch]              gh/williamwen42/349/orig    -> origin/gh/williamwen42/349/orig
2025-12-04T09:17:18.8411516Z  * [new branch]              gh/williamwen42/350/base    -> origin/gh/williamwen42/350/base
2025-12-04T09:17:18.8413333Z  * [new branch]              gh/williamwen42/350/head    -> origin/gh/williamwen42/350/head
2025-12-04T09:17:18.8415318Z  * [new branch]              gh/williamwen42/350/orig    -> origin/gh/williamwen42/350/orig
2025-12-04T09:17:18.8417865Z  * [new branch]              gh/williamwen42/351/base    -> origin/gh/williamwen42/351/base
2025-12-04T09:17:18.8420003Z  * [new branch]              gh/williamwen42/351/head    -> origin/gh/williamwen42/351/head
2025-12-04T09:17:18.8422147Z  * [new branch]              gh/williamwen42/351/orig    -> origin/gh/williamwen42/351/orig
2025-12-04T09:17:18.8425058Z  * [new branch]              gh/williamwen42/352/base    -> origin/gh/williamwen42/352/base
2025-12-04T09:17:18.8426793Z  * [new branch]              gh/williamwen42/352/head    -> origin/gh/williamwen42/352/head
2025-12-04T09:17:18.8428667Z  * [new branch]              gh/williamwen42/352/orig    -> origin/gh/williamwen42/352/orig
2025-12-04T09:17:18.8431362Z  * [new branch]              gh/williamwen42/353/base    -> origin/gh/williamwen42/353/base
2025-12-04T09:17:18.8433329Z  * [new branch]              gh/williamwen42/353/head    -> origin/gh/williamwen42/353/head
2025-12-04T09:17:18.8435308Z  * [new branch]              gh/williamwen42/353/orig    -> origin/gh/williamwen42/353/orig
2025-12-04T09:17:18.8437678Z  * [new branch]              gh/williamwen42/354/base    -> origin/gh/williamwen42/354/base
2025-12-04T09:17:18.8439605Z  * [new branch]              gh/williamwen42/354/head    -> origin/gh/williamwen42/354/head
2025-12-04T09:17:18.8441410Z  * [new branch]              gh/williamwen42/354/orig    -> origin/gh/williamwen42/354/orig
2025-12-04T09:17:18.8443996Z  * [new branch]              gh/williamwen42/355/base    -> origin/gh/williamwen42/355/base
2025-12-04T09:17:18.8445798Z  * [new branch]              gh/williamwen42/355/head    -> origin/gh/williamwen42/355/head
2025-12-04T09:17:18.8447641Z  * [new branch]              gh/williamwen42/355/orig    -> origin/gh/williamwen42/355/orig
2025-12-04T09:17:18.8450222Z  * [new branch]              gh/williamwen42/356/base    -> origin/gh/williamwen42/356/base
2025-12-04T09:17:18.8452130Z  * [new branch]              gh/williamwen42/356/head    -> origin/gh/williamwen42/356/head
2025-12-04T09:17:18.8453974Z  * [new branch]              gh/williamwen42/356/orig    -> origin/gh/williamwen42/356/orig
2025-12-04T09:17:18.8456574Z  * [new branch]              gh/williamwen42/357/base    -> origin/gh/williamwen42/357/base
2025-12-04T09:17:18.8458526Z  * [new branch]              gh/williamwen42/357/head    -> origin/gh/williamwen42/357/head
2025-12-04T09:17:18.8460521Z  * [new branch]              gh/williamwen42/357/orig    -> origin/gh/williamwen42/357/orig
2025-12-04T09:17:18.8463083Z  * [new branch]              gh/williamwen42/358/base    -> origin/gh/williamwen42/358/base
2025-12-04T09:17:18.8464875Z  * [new branch]              gh/williamwen42/358/head    -> origin/gh/williamwen42/358/head
2025-12-04T09:17:18.8466782Z  * [new branch]              gh/williamwen42/358/orig    -> origin/gh/williamwen42/358/orig
2025-12-04T09:17:18.8469781Z  * [new branch]              gh/xmfan/169/base           -> origin/gh/xmfan/169/base
2025-12-04T09:17:18.8471633Z  * [new branch]              gh/xmfan/169/head           -> origin/gh/xmfan/169/head
2025-12-04T09:17:18.8474063Z  * [new branch]              gh/xmfan/170/base           -> origin/gh/xmfan/170/base
2025-12-04T09:17:18.8475861Z  * [new branch]              gh/xmfan/170/head           -> origin/gh/xmfan/170/head
2025-12-04T09:17:18.8478345Z  * [new branch]              gh/xmfan/274/base           -> origin/gh/xmfan/274/base
2025-12-04T09:17:18.8480126Z  * [new branch]              gh/xmfan/274/head           -> origin/gh/xmfan/274/head
2025-12-04T09:17:18.8482067Z  * [new branch]              gh/xmfan/274/orig           -> origin/gh/xmfan/274/orig
2025-12-04T09:17:18.8484492Z  * [new branch]              gh/xmfan/277/base           -> origin/gh/xmfan/277/base
2025-12-04T09:17:18.8486290Z  * [new branch]              gh/xmfan/277/head           -> origin/gh/xmfan/277/head
2025-12-04T09:17:18.8488131Z  * [new branch]              gh/xmfan/277/orig           -> origin/gh/xmfan/277/orig
2025-12-04T09:17:18.8490779Z  * [new branch]              gh/xmfan/301/base           -> origin/gh/xmfan/301/base
2025-12-04T09:17:18.8492453Z  * [new branch]              gh/xmfan/301/head           -> origin/gh/xmfan/301/head
2025-12-04T09:17:18.8494183Z  * [new branch]              gh/xmfan/301/orig           -> origin/gh/xmfan/301/orig
2025-12-04T09:17:18.8496675Z  * [new branch]              gh/xmfan/304/base           -> origin/gh/xmfan/304/base
2025-12-04T09:17:18.8499061Z  * [new branch]              gh/xmfan/304/head           -> origin/gh/xmfan/304/head
2025-12-04T09:17:18.8500920Z  * [new branch]              gh/xmfan/304/orig           -> origin/gh/xmfan/304/orig
2025-12-04T09:17:18.8503381Z  * [new branch]              gh/xmfan/309/base           -> origin/gh/xmfan/309/base
2025-12-04T09:17:18.8505172Z  * [new branch]              gh/xmfan/309/head           -> origin/gh/xmfan/309/head
2025-12-04T09:17:18.8507121Z  * [new branch]              gh/xmfan/309/orig           -> origin/gh/xmfan/309/orig
2025-12-04T09:17:18.8510367Z  * [new branch]              gh/xmfan/310/base           -> origin/gh/xmfan/310/base
2025-12-04T09:17:18.8512008Z  * [new branch]              gh/xmfan/310/head           -> origin/gh/xmfan/310/head
2025-12-04T09:17:18.8513814Z  * [new branch]              gh/xmfan/310/orig           -> origin/gh/xmfan/310/orig
2025-12-04T09:17:18.8516329Z  * [new branch]              gh/xmfan/311/base           -> origin/gh/xmfan/311/base
2025-12-04T09:17:18.8518135Z  * [new branch]              gh/xmfan/311/head           -> origin/gh/xmfan/311/head
2025-12-04T09:17:18.8519949Z  * [new branch]              gh/xmfan/311/orig           -> origin/gh/xmfan/311/orig
2025-12-04T09:17:18.8522960Z  * [new branch]              gh/xmfan/312/base           -> origin/gh/xmfan/312/base
2025-12-04T09:17:18.8524777Z  * [new branch]              gh/xmfan/312/head           -> origin/gh/xmfan/312/head
2025-12-04T09:17:18.8526594Z  * [new branch]              gh/xmfan/312/orig           -> origin/gh/xmfan/312/orig
2025-12-04T09:17:18.8529139Z  * [new branch]              gh/xmfan/313/base           -> origin/gh/xmfan/313/base
2025-12-04T09:17:18.8530942Z  * [new branch]              gh/xmfan/313/head           -> origin/gh/xmfan/313/head
2025-12-04T09:17:18.8532897Z  * [new branch]              gh/xmfan/313/orig           -> origin/gh/xmfan/313/orig
2025-12-04T09:17:18.8535935Z  * [new branch]              gh/xuanzhang816/27/base     -> origin/gh/xuanzhang816/27/base
2025-12-04T09:17:18.8537743Z  * [new branch]              gh/xuanzhang816/27/head     -> origin/gh/xuanzhang816/27/head
2025-12-04T09:17:18.8539731Z  * [new branch]              gh/xuanzhang816/27/orig     -> origin/gh/xuanzhang816/27/orig
2025-12-04T09:17:18.8542428Z  * [new branch]              gh/xuanzhang816/32/base     -> origin/gh/xuanzhang816/32/base
2025-12-04T09:17:18.8544078Z  * [new branch]              gh/xuanzhang816/32/head     -> origin/gh/xuanzhang816/32/head
2025-12-04T09:17:18.8545879Z  * [new branch]              gh/xuanzhang816/32/orig     -> origin/gh/xuanzhang816/32/orig
2025-12-04T09:17:18.8548431Z  * [new branch]              gh/xuanzhang816/33/base     -> origin/gh/xuanzhang816/33/base
2025-12-04T09:17:18.8550268Z  * [new branch]              gh/xuanzhang816/33/head     -> origin/gh/xuanzhang816/33/head
2025-12-04T09:17:18.8552099Z  * [new branch]              gh/xuanzhang816/33/orig     -> origin/gh/xuanzhang816/33/orig
2025-12-04T09:17:18.8554905Z  * [new branch]              gh/xuanzhang816/34/base     -> origin/gh/xuanzhang816/34/base
2025-12-04T09:17:18.8556834Z  * [new branch]              gh/xuanzhang816/34/head     -> origin/gh/xuanzhang816/34/head
2025-12-04T09:17:18.8558662Z  * [new branch]              gh/xuanzhang816/34/orig     -> origin/gh/xuanzhang816/34/orig
2025-12-04T09:17:18.8561377Z  * [new branch]              gh/xuanzhang816/35/base     -> origin/gh/xuanzhang816/35/base
2025-12-04T09:17:18.8563197Z  * [new branch]              gh/xuanzhang816/35/head     -> origin/gh/xuanzhang816/35/head
2025-12-04T09:17:18.8565184Z  * [new branch]              gh/xuanzhang816/35/orig     -> origin/gh/xuanzhang816/35/orig
2025-12-04T09:17:18.8568105Z  * [new branch]              gh/yanbing-j/11/base        -> origin/gh/yanbing-j/11/base
2025-12-04T09:17:18.8569906Z  * [new branch]              gh/yanbing-j/11/head        -> origin/gh/yanbing-j/11/head
2025-12-04T09:17:18.8571988Z  * [new branch]              gh/yanbing-j/11/orig        -> origin/gh/yanbing-j/11/orig
2025-12-04T09:17:18.8574498Z  * [new branch]              gh/yanbing-j/12/base        -> origin/gh/yanbing-j/12/base
2025-12-04T09:17:18.8576350Z  * [new branch]              gh/yanbing-j/12/head        -> origin/gh/yanbing-j/12/head
2025-12-04T09:17:18.8578157Z  * [new branch]              gh/yanbing-j/12/orig        -> origin/gh/yanbing-j/12/orig
2025-12-04T09:17:18.8580934Z  * [new branch]              gh/yanbing-j/13/base        -> origin/gh/yanbing-j/13/base
2025-12-04T09:17:18.8582751Z  * [new branch]              gh/yanbing-j/13/head        -> origin/gh/yanbing-j/13/head
2025-12-04T09:17:18.8584555Z  * [new branch]              gh/yanbing-j/13/orig        -> origin/gh/yanbing-j/13/orig
2025-12-04T09:17:18.8586978Z  * [new branch]              gh/yanbing-j/14/base        -> origin/gh/yanbing-j/14/base
2025-12-04T09:17:18.8588825Z  * [new branch]              gh/yanbing-j/14/head        -> origin/gh/yanbing-j/14/head
2025-12-04T09:17:18.8590720Z  * [new branch]              gh/yanbing-j/14/orig        -> origin/gh/yanbing-j/14/orig
2025-12-04T09:17:18.8593067Z  * [new branch]              gh/yanbing-j/15/base        -> origin/gh/yanbing-j/15/base
2025-12-04T09:17:18.8594917Z  * [new branch]              gh/yanbing-j/15/head        -> origin/gh/yanbing-j/15/head
2025-12-04T09:17:18.8596707Z  * [new branch]              gh/yanbing-j/15/orig        -> origin/gh/yanbing-j/15/orig
2025-12-04T09:17:18.8599164Z  * [new branch]              gh/yanbing-j/18/base        -> origin/gh/yanbing-j/18/base
2025-12-04T09:17:18.8600941Z  * [new branch]              gh/yanbing-j/18/head        -> origin/gh/yanbing-j/18/head
2025-12-04T09:17:18.8602765Z  * [new branch]              gh/yanbing-j/18/orig        -> origin/gh/yanbing-j/18/orig
2025-12-04T09:17:18.8605331Z  * [new branch]              gh/yanbing-j/19/base        -> origin/gh/yanbing-j/19/base
2025-12-04T09:17:18.8607193Z  * [new branch]              gh/yanbing-j/19/head        -> origin/gh/yanbing-j/19/head
2025-12-04T09:17:18.8612133Z  * [new branch]              gh/yanbing-j/19/orig        -> origin/gh/yanbing-j/19/orig
2025-12-04T09:17:18.8614681Z  * [new branch]              gh/yanbing-j/20/base        -> origin/gh/yanbing-j/20/base
2025-12-04T09:17:18.8616506Z  * [new branch]              gh/yanbing-j/20/head        -> origin/gh/yanbing-j/20/head
2025-12-04T09:17:18.8618359Z  * [new branch]              gh/yanbing-j/20/orig        -> origin/gh/yanbing-j/20/orig
2025-12-04T09:17:18.8621058Z  * [new branch]              gh/yanbing-j/21/base        -> origin/gh/yanbing-j/21/base
2025-12-04T09:17:18.8622865Z  * [new branch]              gh/yanbing-j/21/head        -> origin/gh/yanbing-j/21/head
2025-12-04T09:17:18.8625359Z  * [new branch]              gh/yanbing-j/22/base        -> origin/gh/yanbing-j/22/base
2025-12-04T09:17:18.8627113Z  * [new branch]              gh/yanbing-j/22/head        -> origin/gh/yanbing-j/22/head
2025-12-04T09:17:18.8628977Z  * [new branch]              gh/yanbing-j/22/orig        -> origin/gh/yanbing-j/22/orig
2025-12-04T09:17:18.8631349Z  * [new branch]              gh/yanbing-j/23/base        -> origin/gh/yanbing-j/23/base
2025-12-04T09:17:18.8633394Z  * [new branch]              gh/yanbing-j/23/head        -> origin/gh/yanbing-j/23/head
2025-12-04T09:17:18.8635195Z  * [new branch]              gh/yanbing-j/23/orig        -> origin/gh/yanbing-j/23/orig
2025-12-04T09:17:18.8637746Z  * [new branch]              gh/yanbing-j/24/base        -> origin/gh/yanbing-j/24/base
2025-12-04T09:17:18.8639709Z  * [new branch]              gh/yanbing-j/24/head        -> origin/gh/yanbing-j/24/head
2025-12-04T09:17:18.8641698Z  * [new branch]              gh/yanbing-j/24/orig        -> origin/gh/yanbing-j/24/orig
2025-12-04T09:17:18.8644485Z  * [new branch]              gh/yanbing-j/25/base        -> origin/gh/yanbing-j/25/base
2025-12-04T09:17:18.8646412Z  * [new branch]              gh/yanbing-j/25/head        -> origin/gh/yanbing-j/25/head
2025-12-04T09:17:18.8648323Z  * [new branch]              gh/yanbing-j/25/orig        -> origin/gh/yanbing-j/25/orig
2025-12-04T09:17:18.8650669Z  * [new branch]              gh/yanbing-j/26/base        -> origin/gh/yanbing-j/26/base
2025-12-04T09:17:18.8652515Z  * [new branch]              gh/yanbing-j/26/head        -> origin/gh/yanbing-j/26/head
2025-12-04T09:17:18.8654406Z  * [new branch]              gh/yanbing-j/26/orig        -> origin/gh/yanbing-j/26/orig
2025-12-04T09:17:18.8658072Z  * [new branch]              gh/yang-yu-hang/1/base      -> origin/gh/yang-yu-hang/1/base
2025-12-04T09:17:18.8660470Z  * [new branch]              gh/yang-yu-hang/1/head      -> origin/gh/yang-yu-hang/1/head
2025-12-04T09:17:18.8662269Z  * [new branch]              gh/yang-yu-hang/1/orig      -> origin/gh/yang-yu-hang/1/orig
2025-12-04T09:17:18.8664678Z  * [new branch]              gh/yang-yu-hang/2/base      -> origin/gh/yang-yu-hang/2/base
2025-12-04T09:17:18.8666855Z  * [new branch]              gh/yang-yu-hang/2/head      -> origin/gh/yang-yu-hang/2/head
2025-12-04T09:17:18.8668734Z  * [new branch]              gh/yang-yu-hang/2/orig      -> origin/gh/yang-yu-hang/2/orig
2025-12-04T09:17:18.8671329Z  * [new branch]              gh/yang-yu-hang/3/base      -> origin/gh/yang-yu-hang/3/base
2025-12-04T09:17:18.8673220Z  * [new branch]              gh/yang-yu-hang/3/head      -> origin/gh/yang-yu-hang/3/head
2025-12-04T09:17:18.8675077Z  * [new branch]              gh/yang-yu-hang/3/orig      -> origin/gh/yang-yu-hang/3/orig
2025-12-04T09:17:18.8678141Z  * [new branch]              gh/yangw-dev/12/base        -> origin/gh/yangw-dev/12/base
2025-12-04T09:17:18.8679990Z  * [new branch]              gh/yangw-dev/12/head        -> origin/gh/yangw-dev/12/head
2025-12-04T09:17:18.8681826Z  * [new branch]              gh/yangw-dev/12/orig        -> origin/gh/yangw-dev/12/orig
2025-12-04T09:17:18.8684386Z  * [new branch]              gh/yangw-dev/13/base        -> origin/gh/yangw-dev/13/base
2025-12-04T09:17:18.8686292Z  * [new branch]              gh/yangw-dev/13/head        -> origin/gh/yangw-dev/13/head
2025-12-04T09:17:18.8688078Z  * [new branch]              gh/yangw-dev/13/orig        -> origin/gh/yangw-dev/13/orig
2025-12-04T09:17:18.8690506Z  * [new branch]              gh/yangw-dev/14/base        -> origin/gh/yangw-dev/14/base
2025-12-04T09:17:18.8692383Z  * [new branch]              gh/yangw-dev/14/head        -> origin/gh/yangw-dev/14/head
2025-12-04T09:17:18.8694586Z  * [new branch]              gh/yangw-dev/14/orig        -> origin/gh/yangw-dev/14/orig
2025-12-04T09:17:18.8697105Z  * [new branch]              gh/yangw-dev/15/base        -> origin/gh/yangw-dev/15/base
2025-12-04T09:17:18.8698925Z  * [new branch]              gh/yangw-dev/15/head        -> origin/gh/yangw-dev/15/head
2025-12-04T09:17:18.8701029Z  * [new branch]              gh/yangw-dev/15/orig        -> origin/gh/yangw-dev/15/orig
2025-12-04T09:17:18.8703511Z  * [new branch]              gh/yangw-dev/19/base        -> origin/gh/yangw-dev/19/base
2025-12-04T09:17:18.8705177Z  * [new branch]              gh/yangw-dev/19/head        -> origin/gh/yangw-dev/19/head
2025-12-04T09:17:18.8707062Z  * [new branch]              gh/yangw-dev/19/orig        -> origin/gh/yangw-dev/19/orig
2025-12-04T09:17:18.8709939Z  * [new branch]              gh/yangw-dev/26/base        -> origin/gh/yangw-dev/26/base
2025-12-04T09:17:18.8712128Z  * [new branch]              gh/yangw-dev/26/head        -> origin/gh/yangw-dev/26/head
2025-12-04T09:17:18.8713913Z  * [new branch]              gh/yangw-dev/26/orig        -> origin/gh/yangw-dev/26/orig
2025-12-04T09:17:18.8716351Z  * [new branch]              gh/yangw-dev/27/base        -> origin/gh/yangw-dev/27/base
2025-12-04T09:17:18.8718483Z  * [new branch]              gh/yangw-dev/27/head        -> origin/gh/yangw-dev/27/head
2025-12-04T09:17:18.8720860Z  * [new branch]              gh/yangw-dev/27/orig        -> origin/gh/yangw-dev/27/orig
2025-12-04T09:17:18.8723879Z  * [new branch]              gh/ydwu4/292/base           -> origin/gh/ydwu4/292/base
2025-12-04T09:17:18.8725544Z  * [new branch]              gh/ydwu4/292/head           -> origin/gh/ydwu4/292/head
2025-12-04T09:17:18.8727845Z  * [new branch]              gh/ydwu4/292/orig           -> origin/gh/ydwu4/292/orig
2025-12-04T09:17:18.8730449Z  * [new branch]              gh/ydwu4/294/base           -> origin/gh/ydwu4/294/base
2025-12-04T09:17:18.8732371Z  * [new branch]              gh/ydwu4/294/head           -> origin/gh/ydwu4/294/head
2025-12-04T09:17:18.8734671Z  * [new branch]              gh/ydwu4/294/orig           -> origin/gh/ydwu4/294/orig
2025-12-04T09:17:18.8737358Z  * [new branch]              gh/ydwu4/295/base           -> origin/gh/ydwu4/295/base
2025-12-04T09:17:18.8739195Z  * [new branch]              gh/ydwu4/295/head           -> origin/gh/ydwu4/295/head
2025-12-04T09:17:18.8741227Z  * [new branch]              gh/ydwu4/295/orig           -> origin/gh/ydwu4/295/orig
2025-12-04T09:17:18.8743716Z  * [new branch]              gh/ydwu4/296/base           -> origin/gh/ydwu4/296/base
2025-12-04T09:17:18.8745636Z  * [new branch]              gh/ydwu4/296/head           -> origin/gh/ydwu4/296/head
2025-12-04T09:17:18.8747364Z  * [new branch]              gh/ydwu4/296/orig           -> origin/gh/ydwu4/296/orig
2025-12-04T09:17:18.8750032Z  * [new branch]              gh/ydwu4/306/base           -> origin/gh/ydwu4/306/base
2025-12-04T09:17:18.8751958Z  * [new branch]              gh/ydwu4/306/head           -> origin/gh/ydwu4/306/head
2025-12-04T09:17:18.8754124Z  * [new branch]              gh/ydwu4/306/orig           -> origin/gh/ydwu4/306/orig
2025-12-04T09:17:18.8756483Z  * [new branch]              gh/ydwu4/312/base           -> origin/gh/ydwu4/312/base
2025-12-04T09:17:18.8758447Z  * [new branch]              gh/ydwu4/312/head           -> origin/gh/ydwu4/312/head
2025-12-04T09:17:18.8760082Z  * [new branch]              gh/ydwu4/312/orig           -> origin/gh/ydwu4/312/orig
2025-12-04T09:17:18.8762844Z  * [new branch]              gh/ydwu4/322/base           -> origin/gh/ydwu4/322/base
2025-12-04T09:17:18.8764709Z  * [new branch]              gh/ydwu4/322/head           -> origin/gh/ydwu4/322/head
2025-12-04T09:17:18.8766034Z  * [new branch]              gh/ydwu4/322/orig           -> origin/gh/ydwu4/322/orig
2025-12-04T09:17:18.8768855Z  * [new branch]              gh/ydwu4/327/base           -> origin/gh/ydwu4/327/base
2025-12-04T09:17:18.8770851Z  * [new branch]              gh/ydwu4/327/head           -> origin/gh/ydwu4/327/head
2025-12-04T09:17:18.8772193Z  * [new branch]              gh/ydwu4/327/orig           -> origin/gh/ydwu4/327/orig
2025-12-04T09:17:18.8775127Z  * [new branch]              gh/ydwu4/328/base           -> origin/gh/ydwu4/328/base
2025-12-04T09:17:18.8777015Z  * [new branch]              gh/ydwu4/328/head           -> origin/gh/ydwu4/328/head
2025-12-04T09:17:18.8779074Z  * [new branch]              gh/ydwu4/328/orig           -> origin/gh/ydwu4/328/orig
2025-12-04T09:17:18.8781476Z  * [new branch]              gh/ydwu4/329/base           -> origin/gh/ydwu4/329/base
2025-12-04T09:17:18.8783230Z  * [new branch]              gh/ydwu4/329/head           -> origin/gh/ydwu4/329/head
2025-12-04T09:17:18.8785266Z  * [new branch]              gh/ydwu4/329/orig           -> origin/gh/ydwu4/329/orig
2025-12-04T09:17:18.8787782Z  * [new branch]              gh/ydwu4/330/base           -> origin/gh/ydwu4/330/base
2025-12-04T09:17:18.8789380Z  * [new branch]              gh/ydwu4/330/head           -> origin/gh/ydwu4/330/head
2025-12-04T09:17:18.8791469Z  * [new branch]              gh/ydwu4/330/orig           -> origin/gh/ydwu4/330/orig
2025-12-04T09:17:18.8794221Z  * [new branch]              gh/ydwu4/331/base           -> origin/gh/ydwu4/331/base
2025-12-04T09:17:18.8796354Z  * [new branch]              gh/ydwu4/331/head           -> origin/gh/ydwu4/331/head
2025-12-04T09:17:18.8797519Z  * [new branch]              gh/ydwu4/331/orig           -> origin/gh/ydwu4/331/orig
2025-12-04T09:17:18.8800224Z  * [new branch]              gh/ydwu4/332/base           -> origin/gh/ydwu4/332/base
2025-12-04T09:17:18.8801952Z  * [new branch]              gh/ydwu4/332/head           -> origin/gh/ydwu4/332/head
2025-12-04T09:17:18.8804068Z  * [new branch]              gh/ydwu4/332/orig           -> origin/gh/ydwu4/332/orig
2025-12-04T09:17:18.8806435Z  * [new branch]              gh/ydwu4/333/base           -> origin/gh/ydwu4/333/base
2025-12-04T09:17:18.8808228Z  * [new branch]              gh/ydwu4/333/head           -> origin/gh/ydwu4/333/head
2025-12-04T09:17:18.8813777Z  * [new branch]              gh/ydwu4/333/orig           -> origin/gh/ydwu4/333/orig
2025-12-04T09:17:18.8816092Z  * [new branch]              gh/ydwu4/334/base           -> origin/gh/ydwu4/334/base
2025-12-04T09:17:18.8817931Z  * [new branch]              gh/ydwu4/334/head           -> origin/gh/ydwu4/334/head
2025-12-04T09:17:18.8819894Z  * [new branch]              gh/ydwu4/334/orig           -> origin/gh/ydwu4/334/orig
2025-12-04T09:17:18.8822476Z  * [new branch]              gh/ydwu4/335/base           -> origin/gh/ydwu4/335/base
2025-12-04T09:17:18.8824577Z  * [new branch]              gh/ydwu4/335/head           -> origin/gh/ydwu4/335/head
2025-12-04T09:17:18.8825856Z  * [new branch]              gh/ydwu4/335/orig           -> origin/gh/ydwu4/335/orig
2025-12-04T09:17:18.8829213Z  * [new branch]              gh/ydwu4/337/base           -> origin/gh/ydwu4/337/base
2025-12-04T09:17:18.8831374Z  * [new branch]              gh/ydwu4/337/head           -> origin/gh/ydwu4/337/head
2025-12-04T09:17:18.8832626Z  * [new branch]              gh/ydwu4/337/orig           -> origin/gh/ydwu4/337/orig
2025-12-04T09:17:18.8835565Z  * [new branch]              gh/ydwu4/339/base           -> origin/gh/ydwu4/339/base
2025-12-04T09:17:18.8837462Z  * [new branch]              gh/ydwu4/339/head           -> origin/gh/ydwu4/339/head
2025-12-04T09:17:18.8839312Z  * [new branch]              gh/ydwu4/339/orig           -> origin/gh/ydwu4/339/orig
2025-12-04T09:17:18.8842402Z  * [new branch]              gh/yf225/133/base           -> origin/gh/yf225/133/base
2025-12-04T09:17:18.8844998Z  * [new branch]              gh/yf225/133/head           -> origin/gh/yf225/133/head
2025-12-04T09:17:18.8847229Z  * [new branch]              gh/yf225/93/base            -> origin/gh/yf225/93/base
2025-12-04T09:17:18.8849398Z  * [new branch]              gh/yf225/93/head            -> origin/gh/yf225/93/head
2025-12-04T09:17:18.8852736Z  * [new branch]              gh/yifuwang/152/base        -> origin/gh/yifuwang/152/base
2025-12-04T09:17:18.8855220Z  * [new branch]              gh/yifuwang/152/head        -> origin/gh/yifuwang/152/head
2025-12-04T09:17:18.8856446Z  * [new branch]              gh/yifuwang/152/orig        -> origin/gh/yifuwang/152/orig
2025-12-04T09:17:18.8859348Z  * [new branch]              gh/yifuwang/195/base        -> origin/gh/yifuwang/195/base
2025-12-04T09:17:18.8861420Z  * [new branch]              gh/yifuwang/195/head        -> origin/gh/yifuwang/195/head
2025-12-04T09:17:18.8863332Z  * [new branch]              gh/yifuwang/195/orig        -> origin/gh/yifuwang/195/orig
2025-12-04T09:17:18.8866445Z  * [new branch]              gh/yiming0416/1/base        -> origin/gh/yiming0416/1/base
2025-12-04T09:17:18.8868588Z  * [new branch]              gh/yiming0416/1/head        -> origin/gh/yiming0416/1/head
2025-12-04T09:17:18.8870962Z  * [new branch]              gh/yiming0416/2/base        -> origin/gh/yiming0416/2/base
2025-12-04T09:17:18.8872062Z  * [new branch]              gh/yiming0416/2/head        -> origin/gh/yiming0416/2/head
2025-12-04T09:17:18.8876202Z  * [new branch]              gh/yushangdi/1/base         -> origin/gh/yushangdi/1/base
2025-12-04T09:17:18.8878369Z  * [new branch]              gh/yushangdi/1/head         -> origin/gh/yushangdi/1/head
2025-12-04T09:17:18.8880604Z  * [new branch]              gh/yushangdi/10/base        -> origin/gh/yushangdi/10/base
2025-12-04T09:17:18.8882582Z  * [new branch]              gh/yushangdi/10/head        -> origin/gh/yushangdi/10/head
2025-12-04T09:17:18.8883927Z  * [new branch]              gh/yushangdi/10/orig        -> origin/gh/yushangdi/10/orig
2025-12-04T09:17:18.8886816Z  * [new branch]              gh/yushangdi/11/base        -> origin/gh/yushangdi/11/base
2025-12-04T09:17:18.8888982Z  * [new branch]              gh/yushangdi/11/head        -> origin/gh/yushangdi/11/head
2025-12-04T09:17:18.8890180Z  * [new branch]              gh/yushangdi/11/orig        -> origin/gh/yushangdi/11/orig
2025-12-04T09:17:18.8900648Z  * [new branch]              gh/yushangdi/2/base         -> origin/gh/yushangdi/2/base
2025-12-04T09:17:18.8901366Z  * [new branch]              gh/yushangdi/2/head         -> origin/gh/yushangdi/2/head
2025-12-04T09:17:18.8901940Z  * [new branch]              gh/yushangdi/7/base         -> origin/gh/yushangdi/7/base
2025-12-04T09:17:18.8902504Z  * [new branch]              gh/yushangdi/7/head         -> origin/gh/yushangdi/7/head
2025-12-04T09:17:18.8903064Z  * [new branch]              gh/yushangdi/7/orig         -> origin/gh/yushangdi/7/orig
2025-12-04T09:17:18.8903948Z  * [new branch]              gh/yushangdi/8/base         -> origin/gh/yushangdi/8/base
2025-12-04T09:17:18.8906458Z  * [new branch]              gh/yushangdi/8/head         -> origin/gh/yushangdi/8/head
2025-12-04T09:17:18.8907948Z  * [new branch]              gh/yushangdi/8/orig         -> origin/gh/yushangdi/8/orig
2025-12-04T09:17:18.8910920Z  * [new branch]              gh/yushangdi/9/base         -> origin/gh/yushangdi/9/base
2025-12-04T09:17:18.8912906Z  * [new branch]              gh/yushangdi/9/head         -> origin/gh/yushangdi/9/head
2025-12-04T09:17:18.8914252Z  * [new branch]              gh/yushangdi/9/orig         -> origin/gh/yushangdi/9/orig
2025-12-04T09:17:18.8917782Z  * [new branch]              gh/zklaus/19/base           -> origin/gh/zklaus/19/base
2025-12-04T09:17:18.8919902Z  * [new branch]              gh/zklaus/19/head           -> origin/gh/zklaus/19/head
2025-12-04T09:17:18.8921071Z  * [new branch]              gh/zklaus/19/orig           -> origin/gh/zklaus/19/orig
2025-12-04T09:17:18.8924009Z  * [new branch]              gh/zklaus/20/base           -> origin/gh/zklaus/20/base
2025-12-04T09:17:18.8926347Z  * [new branch]              gh/zklaus/20/head           -> origin/gh/zklaus/20/head
2025-12-04T09:17:18.8927914Z  * [new branch]              gh/zklaus/20/orig           -> origin/gh/zklaus/20/orig
2025-12-04T09:17:18.8930688Z  * [new branch]              gh/zklaus/21/base           -> origin/gh/zklaus/21/base
2025-12-04T09:17:18.8932453Z  * [new branch]              gh/zklaus/21/head           -> origin/gh/zklaus/21/head
2025-12-04T09:17:18.8934271Z  * [new branch]              gh/zklaus/21/orig           -> origin/gh/zklaus/21/orig
2025-12-04T09:17:18.8937092Z  * [new branch]              gh/zklaus/22/base           -> origin/gh/zklaus/22/base
2025-12-04T09:17:18.8938430Z  * [new branch]              gh/zklaus/22/head           -> origin/gh/zklaus/22/head
2025-12-04T09:17:18.8940867Z  * [new branch]              gh/zklaus/22/orig           -> origin/gh/zklaus/22/orig
2025-12-04T09:17:18.8943165Z  * [new branch]              gh/zklaus/23/base           -> origin/gh/zklaus/23/base
2025-12-04T09:17:18.8944522Z  * [new branch]              gh/zklaus/23/head           -> origin/gh/zklaus/23/head
2025-12-04T09:17:18.8946941Z  * [new branch]              gh/zklaus/23/orig           -> origin/gh/zklaus/23/orig
2025-12-04T09:17:18.8949304Z  * [new branch]              gh/zklaus/24/base           -> origin/gh/zklaus/24/base
2025-12-04T09:17:18.8951312Z  * [new branch]              gh/zklaus/24/head           -> origin/gh/zklaus/24/head
2025-12-04T09:17:18.8952923Z  * [new branch]              gh/zklaus/24/orig           -> origin/gh/zklaus/24/orig
2025-12-04T09:17:18.8956601Z  * [new branch]              gh/zou3519/1197/base        -> origin/gh/zou3519/1197/base
2025-12-04T09:17:18.8957688Z  * [new branch]              gh/zou3519/1197/head        -> origin/gh/zou3519/1197/head
2025-12-04T09:17:18.8960041Z  * [new branch]              gh/zou3519/1197/orig        -> origin/gh/zou3519/1197/orig
2025-12-04T09:17:18.8962825Z  * [new branch]              gh/zou3519/1199/base        -> origin/gh/zou3519/1199/base
2025-12-04T09:17:18.8964744Z  * [new branch]              gh/zou3519/1199/head        -> origin/gh/zou3519/1199/head
2025-12-04T09:17:18.8966945Z  * [new branch]              gh/zou3519/1199/orig        -> origin/gh/zou3519/1199/orig
2025-12-04T09:17:18.8969172Z  * [new branch]              gh/zou3519/1200/base        -> origin/gh/zou3519/1200/base
2025-12-04T09:17:18.8971159Z  * [new branch]              gh/zou3519/1200/head        -> origin/gh/zou3519/1200/head
2025-12-04T09:17:18.8972485Z  * [new branch]              gh/zou3519/1200/orig        -> origin/gh/zou3519/1200/orig
2025-12-04T09:17:18.8975921Z  * [new branch]              gh/zou3519/1201/base        -> origin/gh/zou3519/1201/base
2025-12-04T09:17:18.8976763Z  * [new branch]              gh/zou3519/1201/head        -> origin/gh/zou3519/1201/head
2025-12-04T09:17:18.8978860Z  * [new branch]              gh/zou3519/1201/orig        -> origin/gh/zou3519/1201/orig
2025-12-04T09:17:18.8981563Z  * [new branch]              gh/zou3519/1202/base        -> origin/gh/zou3519/1202/base
2025-12-04T09:17:18.8982852Z  * [new branch]              gh/zou3519/1202/head        -> origin/gh/zou3519/1202/head
2025-12-04T09:17:18.8985147Z  * [new branch]              gh/zou3519/1202/orig        -> origin/gh/zou3519/1202/orig
2025-12-04T09:17:18.8988688Z  * [new branch]              gh/zpcore/1/base            -> origin/gh/zpcore/1/base
2025-12-04T09:17:18.8989794Z  * [new branch]              gh/zpcore/1/head            -> origin/gh/zpcore/1/head
2025-12-04T09:17:18.8992723Z  * [new branch]              gh/zpcore/11/base           -> origin/gh/zpcore/11/base
2025-12-04T09:17:18.8994751Z  * [new branch]              gh/zpcore/11/head           -> origin/gh/zpcore/11/head
2025-12-04T09:17:18.8997071Z  * [new branch]              gh/zpcore/11/orig           -> origin/gh/zpcore/11/orig
2025-12-04T09:17:18.8999826Z  * [new branch]              gh/zpcore/12/base           -> origin/gh/zpcore/12/base
2025-12-04T09:17:18.9001162Z  * [new branch]              gh/zpcore/12/head           -> origin/gh/zpcore/12/head
2025-12-04T09:17:18.9003345Z  * [new branch]              gh/zpcore/12/orig           -> origin/gh/zpcore/12/orig
2025-12-04T09:17:18.9006050Z  * [new branch]              gh/zpcore/13/base           -> origin/gh/zpcore/13/base
2025-12-04T09:17:18.9007636Z  * [new branch]              gh/zpcore/13/head           -> origin/gh/zpcore/13/head
2025-12-04T09:17:18.9009977Z  * [new branch]              gh/zpcore/13/orig           -> origin/gh/zpcore/13/orig
2025-12-04T09:17:18.9012675Z  * [new branch]              gh/zpcore/14/base           -> origin/gh/zpcore/14/base
2025-12-04T09:17:18.9014027Z  * [new branch]              gh/zpcore/14/head           -> origin/gh/zpcore/14/head
2025-12-04T09:17:18.9016154Z  * [new branch]              gh/zpcore/14/orig           -> origin/gh/zpcore/14/orig
2025-12-04T09:17:18.9018892Z  * [new branch]              gh/zpcore/15/base           -> origin/gh/zpcore/15/base
2025-12-04T09:17:18.9021092Z  * [new branch]              gh/zpcore/15/head           -> origin/gh/zpcore/15/head
2025-12-04T09:17:18.9022417Z  * [new branch]              gh/zpcore/15/orig           -> origin/gh/zpcore/15/orig
2025-12-04T09:17:18.9025303Z  * [new branch]              gh/zpcore/2/base            -> origin/gh/zpcore/2/base
2025-12-04T09:17:18.9027334Z  * [new branch]              gh/zpcore/2/head            -> origin/gh/zpcore/2/head
2025-12-04T09:17:18.9030320Z  * [new branch]              gh/zpcore/21/base           -> origin/gh/zpcore/21/base
2025-12-04T09:17:18.9032496Z  * [new branch]              gh/zpcore/21/head           -> origin/gh/zpcore/21/head
2025-12-04T09:17:18.9033633Z  * [new branch]              gh/zpcore/21/orig           -> origin/gh/zpcore/21/orig
2025-12-04T09:17:18.9037293Z  * [new branch]              gh/zpcore/22/base           -> origin/gh/zpcore/22/base
2025-12-04T09:17:18.9038881Z  * [new branch]              gh/zpcore/22/head           -> origin/gh/zpcore/22/head
2025-12-04T09:17:18.9041100Z  * [new branch]              gh/zpcore/22/orig           -> origin/gh/zpcore/22/orig
2025-12-04T09:17:18.9043716Z  * [new branch]              gh/zpcore/23/base           -> origin/gh/zpcore/23/base
2025-12-04T09:17:18.9045275Z  * [new branch]              gh/zpcore/23/head           -> origin/gh/zpcore/23/head
2025-12-04T09:17:18.9047252Z  * [new branch]              gh/zpcore/23/orig           -> origin/gh/zpcore/23/orig
2025-12-04T09:17:18.9049566Z  * [new branch]              gh/zpcore/24/base           -> origin/gh/zpcore/24/base
2025-12-04T09:17:18.9051451Z  * [new branch]              gh/zpcore/24/head           -> origin/gh/zpcore/24/head
2025-12-04T09:17:18.9053228Z  * [new branch]              gh/zpcore/24/orig           -> origin/gh/zpcore/24/orig
2025-12-04T09:17:18.9056070Z  * [new branch]              gh/zpcore/25/base           -> origin/gh/zpcore/25/base
2025-12-04T09:17:18.9057904Z  * [new branch]              gh/zpcore/25/head           -> origin/gh/zpcore/25/head
2025-12-04T09:17:18.9060550Z  * [new branch]              gh/zpcore/25/orig           -> origin/gh/zpcore/25/orig
2025-12-04T09:17:18.9063268Z  * [new branch]              gh/zpcore/26/base           -> origin/gh/zpcore/26/base
2025-12-04T09:17:18.9065304Z  * [new branch]              gh/zpcore/26/head           -> origin/gh/zpcore/26/head
2025-12-04T09:17:18.9067323Z  * [new branch]              gh/zpcore/26/orig           -> origin/gh/zpcore/26/orig
2025-12-04T09:17:18.9069927Z  * [new branch]              gh/zpcore/27/base           -> origin/gh/zpcore/27/base
2025-12-04T09:17:18.9071229Z  * [new branch]              gh/zpcore/27/head           -> origin/gh/zpcore/27/head
2025-12-04T09:17:18.9073727Z  * [new branch]              gh/zpcore/27/orig           -> origin/gh/zpcore/27/orig
2025-12-04T09:17:18.9076899Z  * [new branch]              gh/zpcore/28/base           -> origin/gh/zpcore/28/base
2025-12-04T09:17:18.9078950Z  * [new branch]              gh/zpcore/28/head           -> origin/gh/zpcore/28/head
2025-12-04T09:17:18.9080763Z  * [new branch]              gh/zpcore/28/orig           -> origin/gh/zpcore/28/orig
2025-12-04T09:17:18.9083113Z  * [new branch]              gh/zpcore/3/base            -> origin/gh/zpcore/3/base
2025-12-04T09:17:18.9084974Z  * [new branch]              gh/zpcore/3/head            -> origin/gh/zpcore/3/head
2025-12-04T09:17:18.9087852Z  * [new branch]              gh/zpcore/4/base            -> origin/gh/zpcore/4/base
2025-12-04T09:17:18.9090306Z  * [new branch]              gh/zpcore/4/head            -> origin/gh/zpcore/4/head
2025-12-04T09:17:18.9092533Z  * [new branch]              gh/zpcore/5/base            -> origin/gh/zpcore/5/base
2025-12-04T09:17:18.9094135Z  * [new branch]              gh/zpcore/5/head            -> origin/gh/zpcore/5/head
2025-12-04T09:17:18.9096798Z  * [new branch]              gh/zpcore/6/base            -> origin/gh/zpcore/6/base
2025-12-04T09:17:18.9098095Z  * [new branch]              gh/zpcore/6/head            -> origin/gh/zpcore/6/head
2025-12-04T09:17:18.9101638Z  * [new branch]              gh/zpcore/7/base            -> origin/gh/zpcore/7/base
2025-12-04T09:17:18.9102926Z  * [new branch]              gh/zpcore/7/head            -> origin/gh/zpcore/7/head
2025-12-04T09:17:18.9105730Z  * [new branch]              gh/zpcore/8/base            -> origin/gh/zpcore/8/base
2025-12-04T09:17:18.9108041Z  * [new branch]              gh/zpcore/8/head            -> origin/gh/zpcore/8/head
2025-12-04T09:17:18.9112296Z  * [new branch]              google-main                 -> origin/google-main
2025-12-04T09:17:18.9115191Z  * [new branch]              guangyey/external_stream    -> origin/guangyey/external_stream
2025-12-04T09:17:18.9116129Z  * [new branch]              guangyey/test_2025          -> origin/guangyey/test_2025
2025-12-04T09:17:18.9119244Z  * [new branch]              guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9
2025-12-04T09:17:18.9121693Z  * [new branch]              hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass
2025-12-04T09:17:18.9123254Z  * [new branch]              hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests
2025-12-04T09:17:18.9125104Z  * [new branch]              hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose
2025-12-04T09:17:18.9127228Z  * [new branch]              hc_baseline                 -> origin/hc_baseline
2025-12-04T09:17:18.9129507Z  * [new branch]              hhh_rand                    -> origin/hhh_rand
2025-12-04T09:17:18.9131614Z  * [new branch]              huba/f1                     -> origin/huba/f1
2025-12-04T09:17:18.9134294Z  * [new branch]              increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test
2025-12-04T09:17:18.9135206Z  * [new branch]              inlining                    -> origin/inlining
2025-12-04T09:17:18.9137577Z  * [new branch]              inlining-ezyang             -> origin/inlining-ezyang
2025-12-04T09:17:18.9139937Z  * [new branch]              install-torchao-0.13.0      -> origin/install-torchao-0.13.0
2025-12-04T09:17:18.9141572Z  * [new branch]              instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters
2025-12-04T09:17:18.9143142Z  * [new branch]              invoke-subgraph             -> origin/invoke-subgraph
2025-12-04T09:17:18.9145473Z  * [new branch]              issue#58739                 -> origin/issue#58739
2025-12-04T09:17:18.9147581Z  * [new branch]              jainapurva-patch-1          -> origin/jainapurva-patch-1
2025-12-04T09:17:18.9149955Z  * [new branch]              jathu/o3                    -> origin/jathu/o3
2025-12-04T09:17:18.9151669Z  * [new branch]              jathu/sve                   -> origin/jathu/sve
2025-12-04T09:17:18.9154411Z  * [new branch]              jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2
2025-12-04T09:17:18.9155797Z  * [new branch]              jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2
2025-12-04T09:17:18.9158758Z  * [new branch]              jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter
2025-12-04T09:17:18.9160105Z  * [new branch]              jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning
2025-12-04T09:17:18.9162396Z  * [new branch]              jithunnair-amd-patch-1      -> origin/jithunnair-amd-patch-1
2025-12-04T09:17:18.9164485Z  * [new branch]              jithunnair-amd-patch-10     -> origin/jithunnair-amd-patch-10
2025-12-04T09:17:18.9165904Z  * [new branch]              jithunnair-amd-patch-2      -> origin/jithunnair-amd-patch-2
2025-12-04T09:17:18.9168441Z  * [new branch]              jithunnair-amd-patch-3      -> origin/jithunnair-amd-patch-3
2025-12-04T09:17:18.9170312Z  * [new branch]              jithunnair-amd-patch-4      -> origin/jithunnair-amd-patch-4
2025-12-04T09:17:18.9171741Z  * [new branch]              jithunnair-amd-patch-5      -> origin/jithunnair-amd-patch-5
2025-12-04T09:17:18.9174080Z  * [new branch]              jithunnair-amd-patch-6      -> origin/jithunnair-amd-patch-6
2025-12-04T09:17:18.9175993Z  * [new branch]              jithunnair-amd-patch-7      -> origin/jithunnair-amd-patch-7
2025-12-04T09:17:18.9177948Z  * [new branch]              jithunnair-amd-patch-8      -> origin/jithunnair-amd-patch-8
2025-12-04T09:17:18.9180301Z  * [new branch]              jithunnair-amd-patch-9      -> origin/jithunnair-amd-patch-9
2025-12-04T09:17:18.9182871Z  * [new branch]              justinchu/native-qdq        -> origin/justinchu/native-qdq
2025-12-04T09:17:18.9185466Z  * [new branch]              kainan666/xlf_debug         -> origin/kainan666/xlf_debug
2025-12-04T09:17:18.9187268Z  * [new branch]              kainan_test                 -> origin/kainan_test
2025-12-04T09:17:18.9189433Z  * [new branch]              larryliu0820-patch-1        -> origin/larryliu0820-patch-1
2025-12-04T09:17:18.9192013Z  * [new branch]              leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues
2025-12-04T09:17:18.9194645Z  * [new branch]              lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error
2025-12-04T09:17:18.9197228Z  * [new branch]              liaoxuan/shm_all_reduce     -> origin/liaoxuan/shm_all_reduce
2025-12-04T09:17:18.9198478Z  * [new branch]              liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax
2025-12-04T09:17:18.9200597Z  * [new branch]              liaoxuan/test_int8_sdpa     -> origin/liaoxuan/test_int8_sdpa
2025-12-04T09:17:18.9201976Z  * [new branch]              llama4-stable               -> origin/llama4-stable
2025-12-04T09:17:18.9205513Z  * [new branch]              lts/release/1.8             -> origin/lts/release/1.8
2025-12-04T09:17:18.9208281Z  * [new branch]              lucaskabela/#94773          -> origin/lucaskabela/#94773
2025-12-04T09:17:18.9209599Z  * [new branch]              lucaskabela/fix_164876      -> origin/lucaskabela/fix_164876
2025-12-04T09:17:18.9211659Z  * [new branch]              lucaskabela/flop_counter    -> origin/lucaskabela/flop_counter
2025-12-04T09:17:18.9213045Z  * [new branch]              lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp
2025-12-04T09:17:18.9215299Z  * [new branch]              lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo
2025-12-04T09:17:18.9216620Z  * [new branch]              lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr
2025-12-04T09:17:18.9219317Z  * [new branch]              lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr
2025-12-04T09:17:18.9222007Z  * [new branch]              lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata
2025-12-04T09:17:18.9223142Z  * [new branch]              lucaskabela/rnn_decomp      -> origin/lucaskabela/rnn_decomp
2025-12-04T09:17:18.9225478Z  * [new branch]              lucaskabela/typing_backends -> origin/lucaskabela/typing_backends
2025-12-04T09:17:18.9226830Z  * [new branch]              lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager
2025-12-04T09:17:18.9228972Z  * [new branch]              lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module
2025-12-04T09:17:18.9230452Z  * [new branch]              lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined
2025-12-04T09:17:18.9232691Z  * [new branch]              lucaskabela/typing_variables -> origin/lucaskabela/typing_variables
2025-12-04T09:17:18.9234178Z  * [new branch]              lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts
2025-12-04T09:17:18.9236429Z  * [new branch]              lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions
2025-12-04T09:17:18.9237941Z  * [new branch]              lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists
2025-12-04T09:17:18.9240891Z  * [new branch]              lw/torch_box_by_ref         -> origin/lw/torch_box_by_ref
2025-12-04T09:17:18.9242732Z  * [new branch]              main                        -> origin/main
2025-12-04T09:17:18.9245205Z  * [new branch]              malfet-patch-1              -> origin/malfet-patch-1
2025-12-04T09:17:18.9247381Z  * [new branch]              malfet-patch-2              -> origin/malfet-patch-2
2025-12-04T09:17:18.9249484Z  * [new branch]              malfet-patch-3              -> origin/malfet-patch-3
2025-12-04T09:17:18.9251888Z  * [new branch]              malfet-patch-4              -> origin/malfet-patch-4
2025-12-04T09:17:18.9254094Z  * [new branch]              malfet-patch-5              -> origin/malfet-patch-5
2025-12-04T09:17:18.9255596Z  * [new branch]              malfet-patch-6              -> origin/malfet-patch-6
2025-12-04T09:17:18.9257847Z  * [new branch]              malfet-patch-7              -> origin/malfet-patch-7
2025-12-04T09:17:18.9260171Z  * [new branch]              malfet-patch-8              -> origin/malfet-patch-8
2025-12-04T09:17:18.9263119Z  * [new branch]              malfet/add-3.14-ci          -> origin/malfet/add-3.14-ci
2025-12-04T09:17:18.9264659Z  * [new branch]              malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts
2025-12-04T09:17:18.9266379Z  * [new branch]              malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch
2025-12-04T09:17:18.9268876Z  * [new branch]              malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers
2025-12-04T09:17:18.9270916Z  * [new branch]              malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im
2025-12-04T09:17:18.9273572Z  * [new branch]              manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe
2025-12-04T09:17:18.9274949Z  * [new branch]              manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp
2025-12-04T09:17:18.9278290Z  * [new branch]              masnesral/metaconda         -> origin/masnesral/metaconda
2025-12-04T09:17:18.9280390Z  * [new branch]              mem_profiler_flaky_fix      -> origin/mem_profiler_flaky_fix
2025-12-04T09:17:18.9282364Z  * [new branch]              mem_profiler_stack_trace    -> origin/mem_profiler_stack_trace
2025-12-04T09:17:18.9283997Z  * [new branch]              memory_profiler_stack       -> origin/memory_profiler_stack
2025-12-04T09:17:18.9286306Z  * [new branch]              metascroy-patch-1           -> origin/metascroy-patch-1
2025-12-04T09:17:18.9287748Z  * [new branch]              mingw_posix                 -> origin/mingw_posix
2025-12-04T09:17:18.9291258Z  * [new branch]              mlazos/S429861-debug        -> origin/mlazos/S429861-debug
2025-12-04T09:17:18.9292082Z  * [new branch]              mlazos/aa                   -> origin/mlazos/aa
2025-12-04T09:17:18.9294053Z  * [new branch]              mlazos/acts                 -> origin/mlazos/acts
2025-12-04T09:17:18.9295701Z  * [new branch]              mlazos/arg-renames          -> origin/mlazos/arg-renames
2025-12-04T09:17:18.9297707Z  * [new branch]              mlazos/bad-cudagraphs       -> origin/mlazos/bad-cudagraphs
2025-12-04T09:17:18.9299107Z  * [new branch]              mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks
2025-12-04T09:17:18.9301105Z  * [new branch]              mlazos/beta-tensor          -> origin/mlazos/beta-tensor
2025-12-04T09:17:18.9302893Z  * [new branch]              mlazos/buffers              -> origin/mlazos/buffers
2025-12-04T09:17:18.9304167Z  * [new branch]              mlazos/buffers2             -> origin/mlazos/buffers2
2025-12-04T09:17:18.9306688Z  * [new branch]              mlazos/buffers3             -> origin/mlazos/buffers3
2025-12-04T09:17:18.9308912Z  * [new branch]              mlazos/bwd                  -> origin/mlazos/bwd
2025-12-04T09:17:18.9310835Z  * [new branch]              mlazos/combo-test           -> origin/mlazos/combo-test
2025-12-04T09:17:18.9312709Z  * [new branch]              mlazos/ctx-cleanup          -> origin/mlazos/ctx-cleanup
2025-12-04T09:17:18.9314592Z  * [new branch]              mlazos/cuda-cmd-log         -> origin/mlazos/cuda-cmd-log
2025-12-04T09:17:18.9316547Z  * [new branch]              mlazos/cudagraph-tests      -> origin/mlazos/cudagraph-tests
2025-12-04T09:17:18.9318281Z  * [new branch]              mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement
2025-12-04T09:17:18.9320277Z  * [new branch]              mlazos/cutlass-test         -> origin/mlazos/cutlass-test
2025-12-04T09:17:18.9322421Z  * [new branch]              mlazos/cutlass-topo-bug     -> origin/mlazos/cutlass-topo-bug
2025-12-04T09:17:18.9324104Z  * [new branch]              mlazos/dataclass-proxy      -> origin/mlazos/dataclass-proxy
2025-12-04T09:17:18.9326280Z  * [new branch]              mlazos/dc-attrs             -> origin/mlazos/dc-attrs
2025-12-04T09:17:18.9327637Z  * [new branch]              mlazos/dc-helion            -> origin/mlazos/dc-helion
2025-12-04T09:17:18.9329803Z  * [new branch]              mlazos/dict-fix             -> origin/mlazos/dict-fix
2025-12-04T09:17:18.9331673Z  * [new branch]              mlazos/disable-tf           -> origin/mlazos/disable-tf
2025-12-04T09:17:18.9333513Z  * [new branch]              mlazos/dupe-fix             -> origin/mlazos/dupe-fix
2025-12-04T09:17:18.9335449Z  * [new branch]              mlazos/dyn-batch            -> origin/mlazos/dyn-batch
2025-12-04T09:17:18.9337494Z  * [new branch]              mlazos/evt                  -> origin/mlazos/evt
2025-12-04T09:17:18.9339499Z  * [new branch]              mlazos/extract-examples     -> origin/mlazos/extract-examples
2025-12-04T09:17:18.9341329Z  * [new branch]              mlazos/foreach-op           -> origin/mlazos/foreach-op
2025-12-04T09:17:18.9343126Z  * [new branch]              mlazos/fp8                  -> origin/mlazos/fp8
2025-12-04T09:17:18.9345028Z  * [new branch]              mlazos/fp8-bias             -> origin/mlazos/fp8-bias
2025-12-04T09:17:18.9346890Z  * [new branch]              mlazos/fp8-bias-fusion      -> origin/mlazos/fp8-bias-fusion
2025-12-04T09:17:18.9348696Z  * [new branch]              mlazos/fp8-fixes            -> origin/mlazos/fp8-fixes
2025-12-04T09:17:18.9350533Z  * [new branch]              mlazos/freezing             -> origin/mlazos/freezing
2025-12-04T09:17:18.9352403Z  * [new branch]              mlazos/h-comp               -> origin/mlazos/h-comp
2025-12-04T09:17:18.9354308Z  * [new branch]              mlazos/h-comp2              -> origin/mlazos/h-comp2
2025-12-04T09:17:18.9356426Z  * [new branch]              mlazos/hash-hop             -> origin/mlazos/hash-hop
2025-12-04T09:17:18.9357958Z  * [new branch]              mlazos/hc                   -> origin/mlazos/hc
2025-12-04T09:17:18.9360060Z  * [new branch]              mlazos/hc-cycles            -> origin/mlazos/hc-cycles
2025-12-04T09:17:18.9361877Z  * [new branch]              mlazos/hc-fixes             -> origin/mlazos/hc-fixes
2025-12-04T09:17:18.9363756Z  * [new branch]              mlazos/hc-fixes3            -> origin/mlazos/hc-fixes3
2025-12-04T09:17:18.9365584Z  * [new branch]              mlazos/hc-fixes4            -> origin/mlazos/hc-fixes4
2025-12-04T09:17:18.9367551Z  * [new branch]              mlazos/hc-hf                -> origin/mlazos/hc-hf
2025-12-04T09:17:18.9369398Z  * [new branch]              mlazos/hc-mut               -> origin/mlazos/hc-mut
2025-12-04T09:17:18.9371313Z  * [new branch]              mlazos/hc10                 -> origin/mlazos/hc10
2025-12-04T09:17:18.9373144Z  * [new branch]              mlazos/hc11                 -> origin/mlazos/hc11
2025-12-04T09:17:18.9374959Z  * [new branch]              mlazos/hc12                 -> origin/mlazos/hc12
2025-12-04T09:17:18.9376979Z  * [new branch]              mlazos/hc13                 -> origin/mlazos/hc13
2025-12-04T09:17:18.9379256Z  * [new branch]              mlazos/hc14                 -> origin/mlazos/hc14
2025-12-04T09:17:18.9381128Z  * [new branch]              mlazos/hc15                 -> origin/mlazos/hc15
2025-12-04T09:17:18.9383193Z  * [new branch]              mlazos/hc2                  -> origin/mlazos/hc2
2025-12-04T09:17:18.9384719Z  * [new branch]              mlazos/hc4                  -> origin/mlazos/hc4
2025-12-04T09:17:18.9386719Z  * [new branch]              mlazos/hc5                  -> origin/mlazos/hc5
2025-12-04T09:17:18.9389033Z  * [new branch]              mlazos/hc6                  -> origin/mlazos/hc6
2025-12-04T09:17:18.9390865Z  * [new branch]              mlazos/hc7                  -> origin/mlazos/hc7
2025-12-04T09:17:18.9392540Z  * [new branch]              mlazos/hc8                  -> origin/mlazos/hc8
2025-12-04T09:17:18.9394438Z  * [new branch]              mlazos/hc9                  -> origin/mlazos/hc9
2025-12-04T09:17:18.9396339Z  * [new branch]              mlazos/hc_baseline2         -> origin/mlazos/hc_baseline2
2025-12-04T09:17:18.9398232Z  * [new branch]              mlazos/inductor-streams     -> origin/mlazos/inductor-streams
2025-12-04T09:17:18.9399546Z  * [new branch]              mlazos/main                 -> origin/mlazos/main
2025-12-04T09:17:18.9401793Z  * [new branch]              mlazos/mcg2                 -> origin/mlazos/mcg2
2025-12-04T09:17:18.9403702Z  * [new branch]              mlazos/meta-guards          -> origin/mlazos/meta-guards
2025-12-04T09:17:18.9406391Z  * [new branch]              mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam
2025-12-04T09:17:18.9407985Z  * [new branch]              mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup
2025-12-04T09:17:18.9410214Z  * [new branch]              mlazos/mod-fix              -> origin/mlazos/mod-fix
2025-12-04T09:17:18.9412171Z  * [new branch]              mlazos/mode-fix             -> origin/mlazos/mode-fix
2025-12-04T09:17:18.9413969Z  * [new branch]              mlazos/offsets              -> origin/mlazos/offsets
2025-12-04T09:17:18.9415585Z  * [new branch]              mlazos/overguarding         -> origin/mlazos/overguarding
2025-12-04T09:17:18.9417866Z  * [new branch]              mlazos/proxy-ctors          -> origin/mlazos/proxy-ctors
2025-12-04T09:17:18.9419610Z  * [new branch]              mlazos/quant-fix            -> origin/mlazos/quant-fix
2025-12-04T09:17:18.9421586Z  * [new branch]              mlazos/resnet-fix           -> origin/mlazos/resnet-fix
2025-12-04T09:17:18.9423442Z  * [new branch]              mlazos/rm-buf-names         -> origin/mlazos/rm-buf-names
2025-12-04T09:17:18.9425296Z  * [new branch]              mlazos/rm-code              -> origin/mlazos/rm-code
2025-12-04T09:17:18.9427305Z  * [new branch]              mlazos/rm-spam              -> origin/mlazos/rm-spam
2025-12-04T09:17:18.9429177Z  * [new branch]              mlazos/rtp                  -> origin/mlazos/rtp
2025-12-04T09:17:18.9431073Z  * [new branch]              mlazos/static-idx-dbg       -> origin/mlazos/static-idx-dbg
2025-12-04T09:17:18.9432978Z  * [new branch]              mlazos/static-inputs-log    -> origin/mlazos/static-inputs-log
2025-12-04T09:17:18.9434316Z  * [new branch]              mlazos/stests               -> origin/mlazos/stests
2025-12-04T09:17:18.9436538Z  * [new branch]              mlazos/stream-ops           -> origin/mlazos/stream-ops
2025-12-04T09:17:18.9438433Z  * [new branch]              mlazos/td-fix2              -> origin/mlazos/td-fix2
2025-12-04T09:17:18.9440334Z  * [new branch]              mlazos/tensor-hasattr2      -> origin/mlazos/tensor-hasattr2
2025-12-04T09:17:18.9442165Z  * [new branch]              mlazos/test                 -> origin/mlazos/test
2025-12-04T09:17:18.9444054Z  * [new branch]              mlazos/tf-mode              -> origin/mlazos/tf-mode
2025-12-04T09:17:18.9446138Z  * [new branch]              mlazos/tf-mode-backup2      -> origin/mlazos/tf-mode-backup2
2025-12-04T09:17:18.9447490Z  * [new branch]              mlazos/tf-mode-reland       -> origin/mlazos/tf-mode-reland
2025-12-04T09:17:18.9449804Z  * [new branch]              mlazos/tf-mode-reland2      -> origin/mlazos/tf-mode-reland2
2025-12-04T09:17:18.9451887Z  * [new branch]              mlazos/tf-mode-reland3      -> origin/mlazos/tf-mode-reland3
2025-12-04T09:17:18.9453159Z  * [new branch]              mlazos/triton-no-epi        -> origin/mlazos/triton-no-epi
2025-12-04T09:17:18.9455346Z  * [new branch]              mlazos/tune-proto           -> origin/mlazos/tune-proto
2025-12-04T09:17:18.9457342Z  * [new branch]              mlazos/tuple-fixes          -> origin/mlazos/tuple-fixes
2025-12-04T09:17:18.9459415Z  * [new branch]              mlazos/tuple-fixes2         -> origin/mlazos/tuple-fixes2
2025-12-04T09:17:18.9461073Z  * [new branch]              mlazos/tuple-handling       -> origin/mlazos/tuple-handling
2025-12-04T09:17:18.9463193Z  * [new branch]              mlazos/user-stream-base     -> origin/mlazos/user-stream-base
2025-12-04T09:17:18.9464659Z  * [new branch]              mlazos/user-streams         -> origin/mlazos/user-streams
2025-12-04T09:17:18.9467099Z  * [new branch]              mlazos/user-streams-backup  -> origin/mlazos/user-streams-backup
2025-12-04T09:17:18.9468439Z  * [new branch]              mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2
2025-12-04T09:17:18.9470617Z  * [new branch]              mlazos/vary-beta            -> origin/mlazos/vary-beta
2025-12-04T09:17:18.9472630Z  * [new branch]              mlazos/vary-beta2           -> origin/mlazos/vary-beta2
2025-12-04T09:17:18.9473974Z  * [new branch]              mlazos/weird-perf1          -> origin/mlazos/weird-perf1
2025-12-04T09:17:18.9476297Z  * [new branch]              mm_out_dtype_compile        -> origin/mm_out_dtype_compile
2025-12-04T09:17:18.9478295Z  * [new branch]              module-shim                 -> origin/module-shim
2025-12-04T09:17:18.9480267Z  * [new branch]              move_config                 -> origin/move_config
2025-12-04T09:17:18.9483076Z  * [new branch]              msaroufim/reduce            -> origin/msaroufim/reduce
2025-12-04T09:17:18.9485731Z  * [new branch]              mtia/basic-cmake            -> origin/mtia/basic-cmake
2025-12-04T09:17:18.9488335Z  * [new branch]              mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape
2025-12-04T09:17:18.9490346Z  * [new branch]              my_varlen_backup            -> origin/my_varlen_backup
2025-12-04T09:17:18.9491935Z  * [new branch]              nativert_num_outputs        -> origin/nativert_num_outputs
2025-12-04T09:17:18.9494171Z  * [new branch]              new-codegen                 -> origin/new-codegen
2025-12-04T09:17:18.9495983Z  * [new branch]              newtest-base                -> origin/newtest-base
2025-12-04T09:17:18.9498629Z  * [new branch]              ngimel/addmm_dtype          -> origin/ngimel/addmm_dtype
2025-12-04T09:17:18.9500304Z  * [new branch]              ngimel/div_inv              -> origin/ngimel/div_inv
2025-12-04T09:17:18.9501643Z  * [new branch]              ngimel/error_index_list     -> origin/ngimel/error_index_list
2025-12-04T09:17:18.9504079Z  * [new branch]              ngimel/gather_grid          -> origin/ngimel/gather_grid
2025-12-04T09:17:18.9505240Z  * [new branch]              ngimel/gather_grid_release  -> origin/ngimel/gather_grid_release
2025-12-04T09:17:18.9507366Z  * [new branch]              ngimel/gg_new               -> origin/ngimel/gg_new
2025-12-04T09:17:18.9509799Z  * [new branch]              ngimel/hostalloc            -> origin/ngimel/hostalloc
2025-12-04T09:17:18.9512227Z  * [new branch]              ngimel/storage_id           -> origin/ngimel/storage_id
2025-12-04T09:17:18.9514260Z  * [new branch]              nightly                     -> origin/nightly
2025-12-04T09:17:18.9516992Z  * [new branch]              nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check
2025-12-04T09:17:18.9518441Z  * [new branch]              nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias
2025-12-04T09:17:18.9520553Z  * [new branch]              nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor
2025-12-04T09:17:18.9522995Z  * [new branch]              nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch
2025-12-04T09:17:18.9524381Z  * [new branch]              nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions
2025-12-04T09:17:18.9526877Z  * [new branch]              nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index
2025-12-04T09:17:18.9528466Z  * [new branch]              nikitaved/test              -> origin/nikitaved/test
2025-12-04T09:17:18.9531005Z  * [new branch]              nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune
2025-12-04T09:17:18.9532495Z  * [new branch]              no_distributed_log_spew     -> origin/no_distributed_log_spew
2025-12-04T09:17:18.9534741Z  * [new branch]              nofun-hack                  -> origin/nofun-hack
2025-12-04T09:17:18.9536814Z  * [new branch]              norm_bench                  -> origin/norm_bench
2025-12-04T09:17:18.9539474Z  * [new branch]              nullplay/fuse_matmul        -> origin/nullplay/fuse_matmul
2025-12-04T09:17:18.9541548Z  * [new branch]              nullplay_fuse_matmul        -> origin/nullplay_fuse_matmul
2025-12-04T09:17:18.9543449Z  * [new branch]              optimizer_test              -> origin/optimizer_test
2025-12-04T09:17:18.9546671Z  * [new branch]              orig/release/1.10           -> origin/orig/release/1.10
2025-12-04T09:17:18.9548575Z  * [new branch]              orig/release/1.11           -> origin/orig/release/1.11
2025-12-04T09:17:18.9550430Z  * [new branch]              orig/release/1.12           -> origin/orig/release/1.12
2025-12-04T09:17:18.9552470Z  * [new branch]              orig/release/1.13           -> origin/orig/release/1.13
2025-12-04T09:17:18.9554417Z  * [new branch]              orig/release/1.6            -> origin/orig/release/1.6
2025-12-04T09:17:18.9556392Z  * [new branch]              orig/release/1.7            -> origin/orig/release/1.7
2025-12-04T09:17:18.9558287Z  * [new branch]              orig/release/1.8            -> origin/orig/release/1.8
2025-12-04T09:17:18.9560190Z  * [new branch]              orig/release/1.9            -> origin/orig/release/1.9
2025-12-04T09:17:18.9562057Z  * [new branch]              orig/release/2.0            -> origin/orig/release/2.0
2025-12-04T09:17:18.9563878Z  * [new branch]              orig/release/2.1            -> origin/orig/release/2.1
2025-12-04T09:17:18.9566020Z  * [new branch]              orig/release/2.2            -> origin/orig/release/2.2
2025-12-04T09:17:18.9567398Z  * [new branch]              orig/release/2.3            -> origin/orig/release/2.3
2025-12-04T09:17:18.9569656Z  * [new branch]              orig/release/2.4            -> origin/orig/release/2.4
2025-12-04T09:17:18.9572084Z  * [new branch]              orig/release/2.5            -> origin/orig/release/2.5
2025-12-04T09:17:18.9573396Z  * [new branch]              orig/release/2.6            -> origin/orig/release/2.6
2025-12-04T09:17:18.9576536Z  * [new branch]              orig/release/2.7            -> origin/orig/release/2.7
2025-12-04T09:17:18.9579168Z  * [new branch]              orig/release/2.8            -> origin/orig/release/2.8
2025-12-04T09:17:18.9580950Z  * [new branch]              orig/release/2.9            -> origin/orig/release/2.9
2025-12-04T09:17:18.9585085Z  * [new branch]              origin/gh/fxdawnn/1/base    -> origin/origin/gh/fxdawnn/1/base
2025-12-04T09:17:18.9586484Z  * [new branch]              origin/gh/fxdawnn/1/orig    -> origin/origin/gh/fxdawnn/1/orig
2025-12-04T09:17:18.9589993Z  * [new branch]              origin/gh/zpcore/14/orig    -> origin/origin/gh/zpcore/14/orig
2025-12-04T09:17:18.9592063Z  * [new branch]              oulgen-patch-1              -> origin/oulgen-patch-1
2025-12-04T09:17:18.9594329Z  * [new branch]              oulgen-patch-2              -> origin/oulgen-patch-2
2025-12-04T09:17:18.9596272Z  * [new branch]              oulgen-patch-3              -> origin/oulgen-patch-3
2025-12-04T09:17:18.9598254Z  * [new branch]              oulgen-patch-4              -> origin/oulgen-patch-4
2025-12-04T09:17:18.9600246Z  * [new branch]              padded-tensor               -> origin/padded-tensor
2025-12-04T09:17:18.9602143Z  * [new branch]              pca2                        -> origin/pca2
2025-12-04T09:17:18.9604377Z  * [new branch]              per_channel_backup          -> origin/per_channel_backup
2025-12-04T09:17:18.9606398Z  * [new branch]              perf_ops                    -> origin/perf_ops
2025-12-04T09:17:18.9608164Z  * [new branch]              perf_ops_2_9                -> origin/perf_ops_2_9
2025-12-04T09:17:18.9610538Z  * [new branch]              pianpwk-patch-1             -> origin/pianpwk-patch-1
2025-12-04T09:17:18.9613109Z  * [new branch]              pianpwk/__draft_debug_mode  -> origin/pianpwk/__draft_debug_mode
2025-12-04T09:17:18.9614503Z  * [new branch]              pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft
2025-12-04T09:17:18.9616535Z  * [new branch]              pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile
2025-12-04T09:17:18.9617922Z  * [new branch]              pianpwk/_draft_triton_11_3  -> origin/pianpwk/_draft_triton_11_3
2025-12-04T09:17:18.9620248Z  * [new branch]              pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft
2025-12-04T09:17:18.9622281Z  * [new branch]              pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys
2025-12-04T09:17:18.9624419Z  * [new branch]              pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode
2025-12-04T09:17:18.9626484Z  * [new branch]              pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size
2025-12-04T09:17:18.9628358Z  * [new branch]              pianpwk/anomaly_tb          -> origin/pianpwk/anomaly_tb
2025-12-04T09:17:18.9629750Z  * [new branch]              pianpwk/auto_fx_annotate    -> origin/pianpwk/auto_fx_annotate
2025-12-04T09:17:18.9631974Z  * [new branch]              pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export
2025-12-04T09:17:18.9633365Z  * [new branch]              pianpwk/bert_dynamic_perf   -> origin/pianpwk/bert_dynamic_perf
2025-12-04T09:17:18.9635649Z  * [new branch]              pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces
2025-12-04T09:17:18.9637561Z  * [new branch]              pianpwk/debug_hash_tensor   -> origin/pianpwk/debug_hash_tensor
2025-12-04T09:17:18.9639433Z  * [new branch]              pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate
2025-12-04T09:17:18.9641184Z  * [new branch]              pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults
2025-12-04T09:17:18.9650464Z  * [new branch]              pianpwk/debug_mode_hacks    -> origin/pianpwk/debug_mode_hacks
2025-12-04T09:17:18.9651329Z  * [new branch]              pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor
2025-12-04T09:17:18.9652018Z  * [new branch]              pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids
2025-12-04T09:17:18.9652697Z  * [new branch]              pianpwk/debug_mode_triton   -> origin/pianpwk/debug_mode_triton
2025-12-04T09:17:18.9653331Z  * [new branch]              pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace
2025-12-04T09:17:18.9654079Z  * [new branch]              pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective
2025-12-04T09:17:18.9654769Z  * [new branch]              pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf
2025-12-04T09:17:18.9656135Z  * [new branch]              pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug
2025-12-04T09:17:18.9658136Z  * [new branch]              pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile
2025-12-04T09:17:18.9659999Z  * [new branch]              pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn
2025-12-04T09:17:18.9662384Z  * [new branch]              pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5
2025-12-04T09:17:18.9663672Z  * [new branch]              pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk
2025-12-04T09:17:18.9665794Z  * [new branch]              pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath
2025-12-04T09:17:18.9667842Z  * [new branch]              pianpwk/event_list_tree     -> origin/pianpwk/event_list_tree
2025-12-04T09:17:18.9669627Z  * [new branch]              pianpwk/false_numel_refs    -> origin/pianpwk/false_numel_refs
2025-12-04T09:17:18.9671452Z  * [new branch]              pianpwk/maybe_guard_rel     -> origin/pianpwk/maybe_guard_rel
2025-12-04T09:17:18.9673383Z  * [new branch]              pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft
2025-12-04T09:17:18.9675850Z  * [new branch]              pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat
2025-12-04T09:17:18.9677716Z  * [new branch]              pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better
2025-12-04T09:17:18.9679496Z  * [new branch]              pianpwk/pre_forward_hook    -> origin/pianpwk/pre_forward_hook
2025-12-04T09:17:18.9681416Z  * [new branch]              pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate
2025-12-04T09:17:18.9683271Z  * [new branch]              pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards
2025-12-04T09:17:18.9685072Z  * [new branch]              pianpwk/sym_tokens_draft    -> origin/pianpwk/sym_tokens_draft
2025-12-04T09:17:18.9687139Z  * [new branch]              pianpwk/symint_one_hot      -> origin/pianpwk/symint_one_hot
2025-12-04T09:17:18.9689107Z  * [new branch]              pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false
2025-12-04T09:17:18.9690877Z  * [new branch]              pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap
2025-12-04T09:17:18.9692647Z  * [new branch]              pianpwk/try_dumb_stuff      -> origin/pianpwk/try_dumb_stuff
2025-12-04T09:17:18.9694564Z  * [new branch]              pianpwk/try_dumb_stuff_2    -> origin/pianpwk/try_dumb_stuff_2
2025-12-04T09:17:18.9696391Z  * [new branch]              pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm
2025-12-04T09:17:18.9698263Z  * [new branch]              pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2
2025-12-04T09:17:18.9700166Z  * [new branch]              pianpwk/user_symints        -> origin/pianpwk/user_symints
2025-12-04T09:17:18.9702082Z  * [new branch]              pianpwk/wan21_reshape       -> origin/pianpwk/wan21_reshape
2025-12-04T09:17:18.9704593Z  * [new branch]              piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112
2025-12-04T09:17:18.9706272Z  * [new branch]              piz/prop_cache_clean        -> origin/piz/prop_cache_clean
2025-12-04T09:17:18.9708345Z  * [new branch]              pool-separate               -> origin/pool-separate
2025-12-04T09:17:18.9710447Z  * [new branch]              pr-156087                   -> origin/pr-156087
2025-12-04T09:17:18.9712910Z  * [new branch]              pr/131860                   -> origin/pr/131860
2025-12-04T09:17:18.9715050Z  * [new branch]              predispatch_to              -> origin/predispatch_to
2025-12-04T09:17:18.9716937Z  * [new branch]              protect-c17                 -> origin/protect-c17
2025-12-04T09:17:18.9718861Z  * [new branch]              pt-opt-cuda3                -> origin/pt-opt-cuda3
2025-12-04T09:17:18.9721320Z  * [new branch]              python_compiled_autograd    -> origin/python_compiled_autograd
2025-12-04T09:17:18.9724093Z  * [new branch]              q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown
2025-12-04T09:17:18.9725739Z  * [new branch]              q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args
2025-12-04T09:17:18.9728790Z  * [new branch]              qchip/export-D54134695      -> origin/qchip/export-D54134695
2025-12-04T09:17:18.9730784Z  * [new branch]              quote-pytest_cache          -> origin/quote-pytest_cache
2025-12-04T09:17:18.9732990Z  * [new branch]              reland-accgrad-stream-warn  -> origin/reland-accgrad-stream-warn
2025-12-04T09:17:18.9735780Z  * [new branch]              release/1.10                -> origin/release/1.10
2025-12-04T09:17:18.9737483Z  * [new branch]              release/1.11                -> origin/release/1.11
2025-12-04T09:17:18.9739378Z  * [new branch]              release/1.12                -> origin/release/1.12
2025-12-04T09:17:18.9741272Z  * [new branch]              release/1.13                -> origin/release/1.13
2025-12-04T09:17:18.9743122Z  * [new branch]              release/1.4                 -> origin/release/1.4
2025-12-04T09:17:18.9744725Z  * [new branch]              release/1.4.1               -> origin/release/1.4.1
2025-12-04T09:17:18.9746604Z  * [new branch]              release/1.5                 -> origin/release/1.5
2025-12-04T09:17:18.9748444Z  * [new branch]              release/1.6                 -> origin/release/1.6
2025-12-04T09:17:18.9750902Z  * [new branch]              release/1.7                 -> origin/release/1.7
2025-12-04T09:17:18.9752822Z  * [new branch]              release/1.8                 -> origin/release/1.8
2025-12-04T09:17:18.9754647Z  * [new branch]              release/1.9                 -> origin/release/1.9
2025-12-04T09:17:18.9756491Z  * [new branch]              release/2.0                 -> origin/release/2.0
2025-12-04T09:17:18.9758520Z  * [new branch]              release/2.1                 -> origin/release/2.1
2025-12-04T09:17:18.9760338Z  * [new branch]              release/2.2                 -> origin/release/2.2
2025-12-04T09:17:18.9762528Z  * [new branch]              release/2.3                 -> origin/release/2.3
2025-12-04T09:17:18.9764869Z  * [new branch]              release/2.4                 -> origin/release/2.4
2025-12-04T09:17:18.9767250Z  * [new branch]              release/2.5                 -> origin/release/2.5
2025-12-04T09:17:18.9769378Z  * [new branch]              release/2.6                 -> origin/release/2.6
2025-12-04T09:17:18.9771366Z  * [new branch]              release/2.7                 -> origin/release/2.7
2025-12-04T09:17:18.9773232Z  * [new branch]              release/2.8                 -> origin/release/2.8
2025-12-04T09:17:18.9775388Z  * [new branch]              release/2.9                 -> origin/release/2.9
2025-12-04T09:17:18.9777272Z  * [new branch]              release_notes               -> origin/release_notes
2025-12-04T09:17:18.9779382Z  * [new branch]              remove_pyinterpreter        -> origin/remove_pyinterpreter
2025-12-04T09:17:18.9781693Z  * [new branch]              replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836
2025-12-04T09:17:18.9783441Z  * [new branch]              replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248
2025-12-04T09:17:18.9785114Z  * [new branch]              replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324
2025-12-04T09:17:18.9786987Z  * [new branch]              replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020
2025-12-04T09:17:18.9790683Z  * [new branch]              revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head
2025-12-04T09:17:18.9794263Z  * [new branch]              revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head
2025-12-04T09:17:18.9797982Z  * [new branch]              revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head
2025-12-04T09:17:18.9801931Z  * [new branch]              revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head
2025-12-04T09:17:18.9804171Z  * [new branch]              revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_
2025-12-04T09:17:18.9805934Z  * [new branch]              revert-hoo-invoke-subgraph  -> origin/revert-hoo-invoke-subgraph
2025-12-04T09:17:18.9807939Z  * [new branch]              revert_always_build_distributed -> origin/revert_always_build_distributed
2025-12-04T09:17:18.9809885Z  * [new branch]              rms_norm_patch              -> origin/rms_norm_patch
2025-12-04T09:17:18.9812614Z  * [new branch]              ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation
2025-12-04T09:17:18.9814126Z  * [new branch]              ruisi/fix_comm_estimation   -> origin/ruisi/fix_comm_estimation
2025-12-04T09:17:18.9815987Z  * [new branch]              ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation
2025-12-04T09:17:18.9817685Z  * [new branch]              ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing
2025-12-04T09:17:18.9819901Z  * [new branch]              ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass
2025-12-04T09:17:18.9822057Z  * [new branch]              ruisi/manual_bucket_pass    -> origin/ruisi/manual_bucket_pass
2025-12-04T09:17:18.9824862Z  * [new branch]              ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures
2025-12-04T09:17:18.9826703Z  * [new branch]              ryanguo99/fix-closure-var   -> origin/ryanguo99/fix-closure-var
2025-12-04T09:17:18.9829203Z  * [new branch]              rzou/faketensor_bench       -> origin/rzou/faketensor_bench
2025-12-04T09:17:18.9830884Z  * [new branch]              rzou/njt                    -> origin/rzou/njt
2025-12-04T09:17:18.9832726Z  * [new branch]              rzou/pca                    -> origin/rzou/pca
2025-12-04T09:17:18.9834443Z  * [new branch]              rzou/realprop               -> origin/rzou/realprop
2025-12-04T09:17:18.9836442Z  * [new branch]              samplevllm                  -> origin/samplevllm
2025-12-04T09:17:18.9839384Z  * [new branch]              sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm
2025-12-04T09:17:18.9841164Z  * [new branch]              sapling-pr-archive-SS-JIA   -> origin/sapling-pr-archive-SS-JIA
2025-12-04T09:17:18.9843161Z  * [new branch]              sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain
2025-12-04T09:17:18.9845062Z  * [new branch]              save                        -> origin/save
2025-12-04T09:17:18.9847097Z  * [new branch]              scaled_mm                   -> origin/scaled_mm
2025-12-04T09:17:18.9849031Z  * [new branch]              scan_attempt                -> origin/scan_attempt
2025-12-04T09:17:18.9851581Z  * [new branch]              sdym/2.5.1                  -> origin/sdym/2.5.1
2025-12-04T09:17:18.9854052Z  * [new branch]              sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix
2025-12-04T09:17:18.9856895Z  * [new branch]              shengf/fx-xform-perf        -> origin/shengf/fx-xform-perf
2025-12-04T09:17:18.9858937Z  * [new branch]              shoumikhin-patch-1          -> origin/shoumikhin-patch-1
2025-12-04T09:17:18.9861042Z  * [new branch]              solve-accuracy-fix          -> origin/solve-accuracy-fix
2025-12-04T09:17:18.9862944Z  * [new branch]              some_rocm_inductor_skips    -> origin/some_rocm_inductor_skips
2025-12-04T09:17:18.9865432Z  * [new branch]              soulitzer/stash-tls-ac      -> origin/soulitzer/stash-tls-ac
2025-12-04T09:17:18.9867389Z  * [new branch]              sparse-mm-bf16-support      -> origin/sparse-mm-bf16-support
2025-12-04T09:17:18.9869306Z  * [new branch]              starterTaskUpdate           -> origin/starterTaskUpdate
2025-12-04T09:17:18.9871255Z  * [new branch]              suo                         -> origin/suo
2025-12-04T09:17:18.9873110Z  * [new branch]              sve-poc                     -> origin/sve-poc
2025-12-04T09:17:18.9875218Z  * [new branch]              switch-bn                   -> origin/switch-bn
2025-12-04T09:17:18.9877197Z  * [new branch]              sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop
2025-12-04T09:17:18.9879098Z  * [new branch]              sy_aot_eager_record         -> origin/sy_aot_eager_record
2025-12-04T09:17:18.9881895Z  * [new branch]              sy_custom_bucketing         -> origin/sy_custom_bucketing
2025-12-04T09:17:18.9883945Z  * [new branch]              sy_debug_mode_test          -> origin/sy_debug_mode_test
2025-12-04T09:17:18.9885263Z  * [new branch]              sy_deserialize              -> origin/sy_deserialize
2025-12-04T09:17:18.9887252Z  * [new branch]              sy_dump_gm_code             -> origin/sy_dump_gm_code
2025-12-04T09:17:18.9889102Z  * [new branch]              sy_exp                      -> origin/sy_exp
2025-12-04T09:17:18.9891105Z  * [new branch]              sy_export_annotation        -> origin/sy_export_annotation
2025-12-04T09:17:18.9893060Z  * [new branch]              sy_invoke_subgraph          -> origin/sy_invoke_subgraph
2025-12-04T09:17:18.9894992Z  * [new branch]              sy_kernel_bw_name           -> origin/sy_kernel_bw_name
2025-12-04T09:17:18.9896866Z  * [new branch]              sy_multi_arch               -> origin/sy_multi_arch
2025-12-04T09:17:18.9898827Z  * [new branch]              sy_nn_module_stack          -> origin/sy_nn_module_stack
2025-12-04T09:17:18.9901001Z  * [new branch]              sy_original_dtensor         -> origin/sy_original_dtensor
2025-12-04T09:17:18.9902911Z  * [new branch]              sy_profiler_cia             -> origin/sy_profiler_cia
2025-12-04T09:17:18.9904767Z  * [new branch]              symm_mem_sync               -> origin/symm_mem_sync
2025-12-04T09:17:18.9906842Z  * [new branch]              sympy-bottleneck-repro      -> origin/sympy-bottleneck-repro
2025-12-04T09:17:18.9908831Z  * [new branch]              tensordict_integration      -> origin/tensordict_integration
2025-12-04T09:17:18.9912762Z  * [new branch]              test-move-conda-builds      -> origin/test-move-conda-builds
2025-12-04T09:17:18.9914582Z  * [new branch]              test-old                    -> origin/test-old
2025-12-04T09:17:18.9917162Z  * [new branch]              test/bmm_heur               -> origin/test/bmm_heur
2025-12-04T09:17:18.9919743Z  * [new branch]              tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix
2025-12-04T09:17:18.9921615Z  * [new branch]              tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune
2025-12-04T09:17:18.9923238Z  * [new branch]              tianren/customOp_fusion     -> origin/tianren/customOp_fusion
2025-12-04T09:17:18.9925077Z  * [new branch]              tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark
2025-12-04T09:17:18.9926940Z  * [new branch]              tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix
2025-12-04T09:17:18.9929353Z  * [new branch]              tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config
2025-12-04T09:17:18.9931214Z  * [new branch]              tianren/dynamic_range_input -> origin/tianren/dynamic_range_input
2025-12-04T09:17:18.9933077Z  * [new branch]              tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix
2025-12-04T09:17:18.9934923Z  * [new branch]              tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge
2025-12-04T09:17:18.9936844Z  * [new branch]              tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp
2025-12-04T09:17:18.9938655Z  * [new branch]              tianren/fx_codegen_dump     -> origin/tianren/fx_codegen_dump
2025-12-04T09:17:18.9940672Z  * [new branch]              tianren/symmetric_memory    -> origin/tianren/symmetric_memory
2025-12-04T09:17:18.9942465Z  * [new branch]              tianren/test                -> origin/tianren/test
2025-12-04T09:17:18.9944407Z  * [new branch]              tidy_performance_cyy        -> origin/tidy_performance_cyy
2025-12-04T09:17:18.9946307Z  * [new branch]              tmp                         -> origin/tmp
2025-12-04T09:17:18.9948259Z  * [new branch]              torchtitan_ep               -> origin/torchtitan_ep
2025-12-04T09:17:18.9950218Z  * [new branch]              torchtitan_integration      -> origin/torchtitan_integration
2025-12-04T09:17:18.9952369Z  * [new branch]              trace_fsdp_torchtune_lora   -> origin/trace_fsdp_torchtune_lora
2025-12-04T09:17:18.9954106Z  * [new branch]              traceable_fsdp_unit_tests   -> origin/traceable_fsdp_unit_tests
2025-12-04T09:17:18.9956062Z  * [new branch]              tree_loop_vec_base          -> origin/tree_loop_vec_base
2025-12-04T09:17:18.9958055Z  * [new branch]              triton_kernel               -> origin/triton_kernel
2025-12-04T09:17:18.9959977Z  * [new branch]              tt_pkg_1908                 -> origin/tt_pkg_1908
2025-12-04T09:17:18.9961887Z  * [new branch]              type_dec                    -> origin/type_dec
2025-12-04T09:17:18.9963858Z  * [new branch]              udate-sphinx-dependancies   -> origin/udate-sphinx-dependancies
2025-12-04T09:17:18.9966706Z  * [new branch]              update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1
2025-12-04T09:17:18.9968423Z  * [new branch]              update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1
2025-12-04T09:17:18.9970239Z  * [new branch]              update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1
2025-12-04T09:17:18.9971997Z  * [new branch]              update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1
2025-12-04T09:17:18.9973776Z  * [new branch]              update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1
2025-12-04T09:17:18.9975855Z  * [new branch]              update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1
2025-12-04T09:17:18.9978431Z  * [new branch]              update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2
2025-12-04T09:17:18.9981198Z  * [new branch]              update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1
2025-12-04T09:17:18.9982954Z  * [new branch]              update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1
2025-12-04T09:17:18.9984567Z  * [new branch]              update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1
2025-12-04T09:17:18.9986386Z  * [new branch]              update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1
2025-12-04T09:17:18.9988174Z  * [new branch]              update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1
2025-12-04T09:17:18.9990713Z  * [new branch]              update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1
2025-12-04T09:17:18.9992621Z  * [new branch]              update-vllm-dockerfile      -> origin/update-vllm-dockerfile
2025-12-04T09:17:18.9995283Z  * [new branch]              update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1
2025-12-04T09:17:18.9997060Z  * [new branch]              update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1
2025-12-04T09:17:18.9998824Z  * [new branch]              update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1
2025-12-04T09:17:19.0000855Z  * [new branch]              update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388
2025-12-04T09:17:19.0002660Z  * [new branch]              update_operator_readme      -> origin/update_operator_readme
2025-12-04T09:17:19.0004628Z  * [new branch]              update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736
2025-12-04T09:17:19.0006594Z  * [new branch]              update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173
2025-12-04T09:17:19.0008830Z  * [new branch]              update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677
2025-12-04T09:17:19.0010886Z  * [new branch]              update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283
2025-12-04T09:17:19.0013250Z  * [new branch]              update_submodule_FBGEMM     -> origin/update_submodule_FBGEMM
2025-12-04T09:17:19.0014696Z  * [new branch]              update_submodule_kineto     -> origin/update_submodule_kineto
2025-12-04T09:17:19.0016657Z  * [new branch]              update_submodule_tensorpipe -> origin/update_submodule_tensorpipe
2025-12-04T09:17:19.0018541Z  * [new branch]              upload-tests-for-autorevert -> origin/upload-tests-for-autorevert
2025-12-04T09:17:19.0020664Z  * [new branch]              v0.1.2                      -> origin/v0.1.2
2025-12-04T09:17:19.0022723Z  * [new branch]              v1.0.1                      -> origin/v1.0.1
2025-12-04T09:17:19.0024801Z  * [new branch]              v1.0.3                      -> origin/v1.0.3
2025-12-04T09:17:19.0027059Z  * [new branch]              v1.1.0                      -> origin/v1.1.0
2025-12-04T09:17:19.0029146Z  * [new branch]              v1.2.0                      -> origin/v1.2.0
2025-12-04T09:17:19.0031109Z  * [new branch]              v1.3.0                      -> origin/v1.3.0
2025-12-04T09:17:19.0033109Z  * [new branch]              v1.3.1                      -> origin/v1.3.1
2025-12-04T09:17:19.0035054Z  * [new branch]              validate_fn                 -> origin/validate_fn
2025-12-04T09:17:19.0037157Z  * [new branch]              validations_2.6             -> origin/validations_2.6
2025-12-04T09:17:19.0039143Z  * [new branch]              validations_2.8             -> origin/validations_2.8
2025-12-04T09:17:19.0041133Z  * [new branch]              varlen-api                  -> origin/varlen-api
2025-12-04T09:17:19.0043070Z  * [new branch]              varlen-api-backup           -> origin/varlen-api-backup
2025-12-04T09:17:19.0045472Z  * [new branch]              varlen_batch_invariance     -> origin/varlen_batch_invariance
2025-12-04T09:17:19.0047794Z  * [new branch]              viable/strict               -> origin/viable/strict
2025-12-04T09:17:19.0050542Z  * [new branch]              vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy
2025-12-04T09:17:19.0052420Z  * [new branch]              vllmbuildci                 -> origin/vllmbuildci
2025-12-04T09:17:19.0054874Z  * [new branch]              vllmpin                     -> origin/vllmpin
2025-12-04T09:17:19.0056957Z  * [new branch]              vscode-recommend-pyrefly    -> origin/vscode-recommend-pyrefly
2025-12-04T09:17:19.0059033Z  * [new branch]              wdvr-patch-1                -> origin/wdvr-patch-1
2025-12-04T09:17:19.0061709Z  * [new branch]              wdvr/iss_145259             -> origin/wdvr/iss_145259
2025-12-04T09:17:19.0064193Z  * [new branch]              whc/pei                     -> origin/whc/pei
2025-12-04T09:17:19.0065882Z  * [new branch]              whc/pp_fix                  -> origin/whc/pp_fix
2025-12-04T09:17:19.0067724Z  * [new branch]              whc/sharding                -> origin/whc/sharding
2025-12-04T09:17:19.0069540Z  * [new branch]              whc/sharding2               -> origin/whc/sharding2
2025-12-04T09:17:19.0071227Z  * [new branch]              whc/uneven                  -> origin/whc/uneven
2025-12-04T09:17:19.0073301Z  * [new branch]              whc/uneven-merge            -> origin/whc/uneven-merge
2025-12-04T09:17:19.0075215Z  * [new branch]              win_warnings                -> origin/win_warnings
2025-12-04T09:17:19.0077461Z  * [new branch]              windows_libtorch_free       -> origin/windows_libtorch_free
2025-12-04T09:17:19.0079478Z  * [new branch]              xmfan-war                   -> origin/xmfan-war
2025-12-04T09:17:19.0082117Z  * [new branch]              xmfan/ca_0516               -> origin/xmfan/ca_0516
2025-12-04T09:17:19.0083886Z  * [new branch]              xmfan/ca_1051b93192         -> origin/xmfan/ca_1051b93192
2025-12-04T09:17:19.0085931Z  * [new branch]              xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8
2025-12-04T09:17:19.0087313Z  * [new branch]              xmfan/ca_5a2be192d1         -> origin/xmfan/ca_5a2be192d1
2025-12-04T09:17:19.0089024Z  * [new branch]              xmfan/ca_9d59b516e9         -> origin/xmfan/ca_9d59b516e9
2025-12-04T09:17:19.0090694Z  * [new branch]              xmfan/ca_apr8               -> origin/xmfan/ca_apr8
2025-12-04T09:17:19.0092513Z  * [new branch]              xmfan/ca_base               -> origin/xmfan/ca_base
2025-12-04T09:17:19.0094613Z  * [new branch]              xmfan/ca_dynamic            -> origin/xmfan/ca_dynamic
2025-12-04T09:17:19.0096801Z  * [new branch]              xmfan/ca_fix_dyn            -> origin/xmfan/ca_fix_dyn
2025-12-04T09:17:19.0098663Z  * [new branch]              xmfan/ca_fix_lowering       -> origin/xmfan/ca_fix_lowering
2025-12-04T09:17:19.0100645Z  * [new branch]              xmfan/ca_fix_polyfills      -> origin/xmfan/ca_fix_polyfills
2025-12-04T09:17:19.0102349Z  * [new branch]              xmfan/ca_jan3               -> origin/xmfan/ca_jan3
2025-12-04T09:17:19.0104120Z  * [new branch]              xmfan/ca_jun18              -> origin/xmfan/ca_jun18
2025-12-04T09:17:19.0105958Z  * [new branch]              xmfan/ca_jun24              -> origin/xmfan/ca_jun24
2025-12-04T09:17:19.0107886Z  * [new branch]              xmfan/ca_nested             -> origin/xmfan/ca_nested
2025-12-04T09:17:19.0112875Z  * [new branch]              xmfan/ca_overhead           -> origin/xmfan/ca_overhead
2025-12-04T09:17:19.0113227Z  * [new branch]              xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451
2025-12-04T09:17:19.0113598Z  * [new branch]              xmfan/cacu_jun18            -> origin/xmfan/cacu_jun18
2025-12-04T09:17:19.0115650Z  * [new branch]              xmfan/cacu_jun19            -> origin/xmfan/cacu_jun19
2025-12-04T09:17:19.0117497Z  * [new branch]              xmfan/cacu_jun4             -> origin/xmfan/cacu_jun4
2025-12-04T09:17:19.0119400Z  * [new branch]              xmfan/disable_duck_shape    -> origin/xmfan/disable_duck_shape
2025-12-04T09:17:19.0121277Z  * [new branch]              xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough
2025-12-04T09:17:19.0123324Z  * [new branch]              xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T09:17:19.0125286Z  * [new branch]              xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T09:17:19.0126828Z  * [new branch]              xmfan/single_step           -> origin/xmfan/single_step
2025-12-04T09:17:19.0128640Z  * [new branch]              xmfan/sth_0829              -> origin/xmfan/sth_0829
2025-12-04T09:17:19.0130518Z  * [new branch]              xmfan/test                  -> origin/xmfan/test
2025-12-04T09:17:19.0133181Z  * [new branch]              yguo/debug-0226-constexpr   -> origin/yguo/debug-0226-constexpr
2025-12-04T09:17:19.0134882Z  * [new branch]              yguo/new_latest_changes     -> origin/yguo/new_latest_changes
2025-12-04T09:17:19.0136673Z  * [new branch]              yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes
2025-12-04T09:17:19.0139257Z  * [new branch]              yiming/bootcamp             -> origin/yiming/bootcamp
2025-12-04T09:17:19.0141253Z  * [new branch]              yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop
2025-12-04T09:17:19.0143017Z  * [new branch]              yolo-llama3                 -> origin/yolo-llama3
2025-12-04T09:17:19.0145575Z  * [new branch]              zainr/canary-test           -> origin/zainr/canary-test
2025-12-04T09:17:19.0147496Z  * [new branch]              zainr/cleanup-gh-runners    -> origin/zainr/cleanup-gh-runners
2025-12-04T09:17:19.0149183Z  * [new branch]              zainr/pull-migration-c      -> origin/zainr/pull-migration-c
2025-12-04T09:17:19.0150842Z  * [new branch]              zainr/test2                 -> origin/zainr/test2
2025-12-04T09:17:19.0153039Z  * [new branch]              zasdfgbnm-patch-3           -> origin/zasdfgbnm-patch-3
2025-12-04T09:17:19.0154847Z  * [new branch]              zb2p                        -> origin/zb2p
2025-12-04T09:17:19.0156814Z  * [new branch]              zeros-and-scatter-part2     -> origin/zeros-and-scatter-part2
2025-12-04T09:17:19.0159904Z  * [new branch]              zhxchen17/ci/vllm_lora_oom  -> origin/zhxchen17/ci/vllm_lora_oom
2025-12-04T09:17:19.0161677Z  * [new branch]              zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom
2025-12-04T09:17:19.0163724Z  * [new branch]              zhxchen17/ci/vllm_pin       -> origin/zhxchen17/ci/vllm_pin
2025-12-04T09:17:19.0166224Z  * [new branch]              zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards
2025-12-04T09:17:19.0168598Z  * [new branch]              zhxchen17/export/call_override -> origin/zhxchen17/export/call_override
2025-12-04T09:17:19.0170893Z  * [new branch]              zhxchen17/export/codemod1   -> origin/zhxchen17/export/codemod1
2025-12-04T09:17:19.0172726Z  * [new branch]              zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return
2025-12-04T09:17:19.0174676Z  * [new branch]              zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn
2025-12-04T09:17:19.0176351Z  * [new branch]              zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check
2025-12-04T09:17:19.0178826Z  * [new branch]              zhxchen17/precompile/aoti   -> origin/zhxchen17/precompile/aoti
2025-12-04T09:17:19.0180856Z  * [new branch]              zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals
2025-12-04T09:17:19.0182706Z  * [new branch]              zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards
2025-12-04T09:17:19.0185031Z  * [new branch]              zhxchen17/scratch/0         -> origin/zhxchen17/scratch/0
2025-12-04T09:17:19.0186937Z  * [new branch]              zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update
2025-12-04T09:17:19.0189541Z  * [new branch]              zhxhcen17/moodycamel        -> origin/zhxhcen17/moodycamel
2025-12-04T09:17:19.0192095Z  * [new branch]              zxiiro/build-times          -> origin/zxiiro/build-times
2025-12-04T09:17:19.0193944Z  * [new branch]              zxiiro/c7i.2xlarge          -> origin/zxiiro/c7i.2xlarge
2025-12-04T09:17:19.0195755Z  * [new branch]              zxiiro/c7i.2xlarge.h100     -> origin/zxiiro/c7i.2xlarge.h100
2025-12-04T09:17:19.0197600Z  * [new branch]              zxiiro/main                 -> origin/zxiiro/main
2025-12-04T09:17:19.0199338Z  * [new branch]              zxiiro/risc64               -> origin/zxiiro/risc64
2025-12-04T09:17:19.0201171Z  * [new branch]              zxiiro/test-multicloud-arc  -> origin/zxiiro/test-multicloud-arc
2025-12-04T09:17:19.0202884Z  * [new tag]                 bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug
2025-12-04T09:17:19.0204358Z  * [new tag]                 ci/binaries/77164           -> ci/binaries/77164
2025-12-04T09:17:19.0206064Z  * [new tag]                 ciflow/b200/115316          -> ciflow/b200/115316
2025-12-04T09:17:19.0207277Z  * [new tag]                 ciflow/b200/160685          -> ciflow/b200/160685
2025-12-04T09:17:19.0208702Z  * [new tag]                 ciflow/b200/161607          -> ciflow/b200/161607
2025-12-04T09:17:19.0212311Z  * [new tag]                 ciflow/b200/161938          -> ciflow/b200/161938
2025-12-04T09:17:19.0213628Z  * [new tag]                 ciflow/b200/167207          -> ciflow/b200/167207
2025-12-04T09:17:19.0214869Z  * [new tag]                 ciflow/b200/167989          -> ciflow/b200/167989
2025-12-04T09:17:19.0216276Z  * [new tag]                 ciflow/b200/168096          -> ciflow/b200/168096
2025-12-04T09:17:19.0217718Z  * [new tag]                 ciflow/b200/168175          -> ciflow/b200/168175
2025-12-04T09:17:19.0219172Z  * [new tag]                 ciflow/b200/168195          -> ciflow/b200/168195
2025-12-04T09:17:19.0220563Z  * [new tag]                 ciflow/b200/169200          -> ciflow/b200/169200
2025-12-04T09:17:19.0221900Z  * [new tag]                 ciflow/b200/169216          -> ciflow/b200/169216
2025-12-04T09:17:19.0223657Z  * [new tag]                 ciflow/b200/169380          -> ciflow/b200/169380
2025-12-04T09:17:19.0225478Z  * [new tag]                 ciflow/b200/169412          -> ciflow/b200/169412
2025-12-04T09:17:19.0227018Z  * [new tag]                 ciflow/b200/169470          -> ciflow/b200/169470
2025-12-04T09:17:19.0228837Z  * [new tag]                 ciflow/b200/169471          -> ciflow/b200/169471
2025-12-04T09:17:19.0230280Z  * [new tag]                 ciflow/b200/169472          -> ciflow/b200/169472
2025-12-04T09:17:19.0231757Z  * [new tag]                 ciflow/b200/169514          -> ciflow/b200/169514
2025-12-04T09:17:19.0233051Z  * [new tag]                 ciflow/b200/169517          -> ciflow/b200/169517
2025-12-04T09:17:19.0234750Z  * [new tag]                 ciflow/binaries/165922      -> ciflow/binaries/165922
2025-12-04T09:17:19.0236070Z  * [new tag]                 ciflow/binaries/169510      -> ciflow/binaries/169510
2025-12-04T09:17:19.0237790Z  * [new tag]                 ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994
2025-12-04T09:17:19.0239353Z  * [new tag]                 ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829
2025-12-04T09:17:19.0240487Z  * [new tag]                 ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972
2025-12-04T09:17:19.0242090Z  * [new tag]                 ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981
2025-12-04T09:17:19.0243537Z  * [new tag]                 ciflow/dynamo/167695        -> ciflow/dynamo/167695
2025-12-04T09:17:19.0244752Z  * [new tag]                 ciflow/dynamo/168096        -> ciflow/dynamo/168096
2025-12-04T09:17:19.0246155Z  * [new tag]                 ciflow/dynamo/169525        -> ciflow/dynamo/169525
2025-12-04T09:17:19.0247720Z  * [new tag]                 ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938
2025-12-04T09:17:19.0248721Z  * [new tag]                 ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940
2025-12-04T09:17:19.0250528Z  * [new tag]                 ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923
2025-12-04T09:17:19.0252055Z  * [new tag]                 ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552
2025-12-04T09:17:19.0253160Z  * [new tag]                 ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129
2025-12-04T09:17:19.0254483Z  * [new tag]                 ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917
2025-12-04T09:17:19.0256034Z  * [new tag]                 ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156
2025-12-04T09:17:19.0257279Z  * [new tag]                 ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200
2025-12-04T09:17:19.0258609Z  * [new tag]                 ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216
2025-12-04T09:17:19.0259975Z  * [new tag]                 ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338
2025-12-04T09:17:19.0261380Z  * [new tag]                 ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355
2025-12-04T09:17:19.0262502Z  * [new tag]                 ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543
2025-12-04T09:17:19.0264024Z  * [new tag]                 ciflow/h100/115316          -> ciflow/h100/115316
2025-12-04T09:17:19.0265273Z  * [new tag]                 ciflow/h100/160685          -> ciflow/h100/160685
2025-12-04T09:17:19.0266490Z  * [new tag]                 ciflow/h100/160729          -> ciflow/h100/160729
2025-12-04T09:17:19.0267751Z  * [new tag]                 ciflow/h100/161607          -> ciflow/h100/161607
2025-12-04T09:17:19.0268973Z  * [new tag]                 ciflow/h100/161938          -> ciflow/h100/161938
2025-12-04T09:17:19.0270325Z  * [new tag]                 ciflow/h100/167207          -> ciflow/h100/167207
2025-12-04T09:17:19.0271229Z  * [new tag]                 ciflow/h100/167989          -> ciflow/h100/167989
2025-12-04T09:17:19.0272682Z  * [new tag]                 ciflow/h100/168096          -> ciflow/h100/168096
2025-12-04T09:17:19.0273666Z  * [new tag]                 ciflow/h100/168175          -> ciflow/h100/168175
2025-12-04T09:17:19.0275114Z  * [new tag]                 ciflow/h100/168195          -> ciflow/h100/168195
2025-12-04T09:17:19.0276337Z  * [new tag]                 ciflow/h100/168980          -> ciflow/h100/168980
2025-12-04T09:17:19.0277927Z  * [new tag]                 ciflow/h100/169200          -> ciflow/h100/169200
2025-12-04T09:17:19.0279575Z  * [new tag]                 ciflow/h100/169216          -> ciflow/h100/169216
2025-12-04T09:17:19.0281075Z  * [new tag]                 ciflow/h100/169380          -> ciflow/h100/169380
2025-12-04T09:17:19.0282372Z  * [new tag]                 ciflow/h100/169412          -> ciflow/h100/169412
2025-12-04T09:17:19.0283668Z  * [new tag]                 ciflow/h100/169470          -> ciflow/h100/169470
2025-12-04T09:17:19.0284970Z  * [new tag]                 ciflow/h100/169471          -> ciflow/h100/169471
2025-12-04T09:17:19.0286234Z  * [new tag]                 ciflow/h100/169472          -> ciflow/h100/169472
2025-12-04T09:17:19.0287550Z  * [new tag]                 ciflow/h100/169514          -> ciflow/h100/169514
2025-12-04T09:17:19.0289097Z  * [new tag]                 ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096
2025-12-04T09:17:19.0291036Z  * [new tag]                 ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096
2025-12-04T09:17:19.0292483Z  * [new tag]                 ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165
2025-12-04T09:17:19.0294204Z  * [new tag]                 ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096
2025-12-04T09:17:19.0295835Z  * [new tag]                 ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096
2025-12-04T09:17:19.0297741Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073
2025-12-04T09:17:19.0298756Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096
2025-12-04T09:17:19.0300472Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024
2025-12-04T09:17:19.0302099Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024
2025-12-04T09:17:19.0303180Z  * [new tag]                 ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096
2025-12-04T09:17:19.0305228Z  * [new tag]                 ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096
2025-12-04T09:17:19.0305991Z  * [new tag]                 ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024
2025-12-04T09:17:19.0307541Z  * [new tag]                 ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425
2025-12-04T09:17:19.0309347Z  * [new tag]                 ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545
2025-12-04T09:17:19.0310679Z  * [new tag]                 ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997
2025-12-04T09:17:19.0312461Z  * [new tag]                 ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096
2025-12-04T09:17:19.0313803Z  * [new tag]                 ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063
2025-12-04T09:17:19.0314787Z  * [new tag]                 ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425
2025-12-04T09:17:19.0316720Z  * [new tag]                 ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545
2025-12-04T09:17:19.0317605Z  * [new tag]                 ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096
2025-12-04T09:17:19.0319081Z  * [new tag]                 ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063
2025-12-04T09:17:19.0320062Z  * [new tag]                 ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425
2025-12-04T09:17:19.0321964Z  * [new tag]                 ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052
2025-12-04T09:17:19.0323278Z  * [new tag]                 ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971
2025-12-04T09:17:19.0324786Z  * [new tag]                 ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096
2025-12-04T09:17:19.0326248Z  * [new tag]                 ciflow/inductor/144542      -> ciflow/inductor/144542
2025-12-04T09:17:19.0327468Z  * [new tag]                 ciflow/inductor/146506      -> ciflow/inductor/146506
2025-12-04T09:17:19.0329116Z  * [new tag]                 ciflow/inductor/147990      -> ciflow/inductor/147990
2025-12-04T09:17:19.0330553Z  * [new tag]                 ciflow/inductor/148294      -> ciflow/inductor/148294
2025-12-04T09:17:19.0331815Z  * [new tag]                 ciflow/inductor/148492      -> ciflow/inductor/148492
2025-12-04T09:17:19.0333068Z  * [new tag]                 ciflow/inductor/157149      -> ciflow/inductor/157149
2025-12-04T09:17:19.0334354Z  * [new tag]                 ciflow/inductor/157994      -> ciflow/inductor/157994
2025-12-04T09:17:19.0335326Z  * [new tag]                 ciflow/inductor/160685      -> ciflow/inductor/160685
2025-12-04T09:17:19.0336810Z  * [new tag]                 ciflow/inductor/160686      -> ciflow/inductor/160686
2025-12-04T09:17:19.0338123Z  * [new tag]                 ciflow/inductor/160687      -> ciflow/inductor/160687
2025-12-04T09:17:19.0339646Z  * [new tag]                 ciflow/inductor/160688      -> ciflow/inductor/160688
2025-12-04T09:17:19.0341301Z  * [new tag]                 ciflow/inductor/160706      -> ciflow/inductor/160706
2025-12-04T09:17:19.0343000Z  * [new tag]                 ciflow/inductor/160729      -> ciflow/inductor/160729
2025-12-04T09:17:19.0344572Z  * [new tag]                 ciflow/inductor/161938      -> ciflow/inductor/161938
2025-12-04T09:17:19.0345940Z  * [new tag]                 ciflow/inductor/161939      -> ciflow/inductor/161939
2025-12-04T09:17:19.0347194Z  * [new tag]                 ciflow/inductor/161940      -> ciflow/inductor/161940
2025-12-04T09:17:19.0348528Z  * [new tag]                 ciflow/inductor/162052      -> ciflow/inductor/162052
2025-12-04T09:17:19.0349862Z  * [new tag]                 ciflow/inductor/162275      -> ciflow/inductor/162275
2025-12-04T09:17:19.0351175Z  * [new tag]                 ciflow/inductor/162795      -> ciflow/inductor/162795
2025-12-04T09:17:19.0352711Z  * [new tag]                 ciflow/inductor/163245      -> ciflow/inductor/163245
2025-12-04T09:17:19.0354043Z  * [new tag]                 ciflow/inductor/163335      -> ciflow/inductor/163335
2025-12-04T09:17:19.0355362Z  * [new tag]                 ciflow/inductor/163503      -> ciflow/inductor/163503
2025-12-04T09:17:19.0356672Z  * [new tag]                 ciflow/inductor/163942      -> ciflow/inductor/163942
2025-12-04T09:17:19.0358122Z  * [new tag]                 ciflow/inductor/165270      -> ciflow/inductor/165270
2025-12-04T09:17:19.0359435Z  * [new tag]                 ciflow/inductor/165274      -> ciflow/inductor/165274
2025-12-04T09:17:19.0360761Z  * [new tag]                 ciflow/inductor/165322      -> ciflow/inductor/165322
2025-12-04T09:17:19.0362079Z  * [new tag]                 ciflow/inductor/165597      -> ciflow/inductor/165597
2025-12-04T09:17:19.0363379Z  * [new tag]                 ciflow/inductor/166063      -> ciflow/inductor/166063
2025-12-04T09:17:19.0364706Z  * [new tag]                 ciflow/inductor/166075      -> ciflow/inductor/166075
2025-12-04T09:17:19.0366126Z  * [new tag]                 ciflow/inductor/166165      -> ciflow/inductor/166165
2025-12-04T09:17:19.0367555Z  * [new tag]                 ciflow/inductor/166254      -> ciflow/inductor/166254
2025-12-04T09:17:19.0368868Z  * [new tag]                 ciflow/inductor/166483      -> ciflow/inductor/166483
2025-12-04T09:17:19.0370164Z  * [new tag]                 ciflow/inductor/166494      -> ciflow/inductor/166494
2025-12-04T09:17:19.0371424Z  * [new tag]                 ciflow/inductor/166545      -> ciflow/inductor/166545
2025-12-04T09:17:19.0372884Z  * [new tag]                 ciflow/inductor/166788      -> ciflow/inductor/166788
2025-12-04T09:17:19.0374345Z  * [new tag]                 ciflow/inductor/166846      -> ciflow/inductor/166846
2025-12-04T09:17:19.0375687Z  * [new tag]                 ciflow/inductor/167300      -> ciflow/inductor/167300
2025-12-04T09:17:19.0377029Z  * [new tag]                 ciflow/inductor/167407      -> ciflow/inductor/167407
2025-12-04T09:17:19.0378462Z  * [new tag]                 ciflow/inductor/167536      -> ciflow/inductor/167536
2025-12-04T09:17:19.0379898Z  * [new tag]                 ciflow/inductor/167552      -> ciflow/inductor/167552
2025-12-04T09:17:19.0381187Z  * [new tag]                 ciflow/inductor/167555      -> ciflow/inductor/167555
2025-12-04T09:17:19.0382609Z  * [new tag]                 ciflow/inductor/167583      -> ciflow/inductor/167583
2025-12-04T09:17:19.0383903Z  * [new tag]                 ciflow/inductor/167599      -> ciflow/inductor/167599
2025-12-04T09:17:19.0385244Z  * [new tag]                 ciflow/inductor/167647      -> ciflow/inductor/167647
2025-12-04T09:17:19.0386569Z  * [new tag]                 ciflow/inductor/167677      -> ciflow/inductor/167677
2025-12-04T09:17:19.0387886Z  * [new tag]                 ciflow/inductor/167680      -> ciflow/inductor/167680
2025-12-04T09:17:19.0389207Z  * [new tag]                 ciflow/inductor/167695      -> ciflow/inductor/167695
2025-12-04T09:17:19.0390519Z  * [new tag]                 ciflow/inductor/167742      -> ciflow/inductor/167742
2025-12-04T09:17:19.0391814Z  * [new tag]                 ciflow/inductor/167768      -> ciflow/inductor/167768
2025-12-04T09:17:19.0393353Z  * [new tag]                 ciflow/inductor/167773      -> ciflow/inductor/167773
2025-12-04T09:17:19.0394726Z  * [new tag]                 ciflow/inductor/167781      -> ciflow/inductor/167781
2025-12-04T09:17:19.0396020Z  * [new tag]                 ciflow/inductor/167880      -> ciflow/inductor/167880
2025-12-04T09:17:19.0397351Z  * [new tag]                 ciflow/inductor/167887      -> ciflow/inductor/167887
2025-12-04T09:17:19.0399194Z  * [new tag]                 ciflow/inductor/167972      -> ciflow/inductor/167972
2025-12-04T09:17:19.0400492Z  * [new tag]                 ciflow/inductor/167989      -> ciflow/inductor/167989
2025-12-04T09:17:19.0401807Z  * [new tag]                 ciflow/inductor/168002      -> ciflow/inductor/168002
2025-12-04T09:17:19.0403114Z  * [new tag]                 ciflow/inductor/168050      -> ciflow/inductor/168050
2025-12-04T09:17:19.0404470Z  * [new tag]                 ciflow/inductor/168051      -> ciflow/inductor/168051
2025-12-04T09:17:19.0405788Z  * [new tag]                 ciflow/inductor/168052      -> ciflow/inductor/168052
2025-12-04T09:17:19.0407094Z  * [new tag]                 ciflow/inductor/168073      -> ciflow/inductor/168073
2025-12-04T09:17:19.0408190Z  * [new tag]                 ciflow/inductor/168096      -> ciflow/inductor/168096
2025-12-04T09:17:19.0409959Z  * [new tag]                 ciflow/inductor/168114      -> ciflow/inductor/168114
2025-12-04T09:17:19.0411227Z  * [new tag]                 ciflow/inductor/168115      -> ciflow/inductor/168115
2025-12-04T09:17:19.0412535Z  * [new tag]                 ciflow/inductor/168127      -> ciflow/inductor/168127
2025-12-04T09:17:19.0413842Z  * [new tag]                 ciflow/inductor/168129      -> ciflow/inductor/168129
2025-12-04T09:17:19.0415223Z  * [new tag]                 ciflow/inductor/168157      -> ciflow/inductor/168157
2025-12-04T09:17:19.0416729Z  * [new tag]                 ciflow/inductor/168175      -> ciflow/inductor/168175
2025-12-04T09:17:19.0417668Z  * [new tag]                 ciflow/inductor/168185      -> ciflow/inductor/168185
2025-12-04T09:17:19.0419271Z  * [new tag]                 ciflow/inductor/168195      -> ciflow/inductor/168195
2025-12-04T09:17:19.0420707Z  * [new tag]                 ciflow/inductor/168209      -> ciflow/inductor/168209
2025-12-04T09:17:19.0421952Z  * [new tag]                 ciflow/inductor/168266      -> ciflow/inductor/168266
2025-12-04T09:17:19.0423236Z  * [new tag]                 ciflow/inductor/168316      -> ciflow/inductor/168316
2025-12-04T09:17:19.0424721Z  * [new tag]                 ciflow/inductor/168326      -> ciflow/inductor/168326
2025-12-04T09:17:19.0426067Z  * [new tag]                 ciflow/inductor/168368      -> ciflow/inductor/168368
2025-12-04T09:17:19.0427438Z  * [new tag]                 ciflow/inductor/168894      -> ciflow/inductor/168894
2025-12-04T09:17:19.0428780Z  * [new tag]                 ciflow/inductor/168934      -> ciflow/inductor/168934
2025-12-04T09:17:19.0430069Z  * [new tag]                 ciflow/inductor/168939      -> ciflow/inductor/168939
2025-12-04T09:17:19.0431446Z  * [new tag]                 ciflow/inductor/168946      -> ciflow/inductor/168946
2025-12-04T09:17:19.0432706Z  * [new tag]                 ciflow/inductor/168950      -> ciflow/inductor/168950
2025-12-04T09:17:19.0434039Z  * [new tag]                 ciflow/inductor/168951      -> ciflow/inductor/168951
2025-12-04T09:17:19.0435364Z  * [new tag]                 ciflow/inductor/168952      -> ciflow/inductor/168952
2025-12-04T09:17:19.0436662Z  * [new tag]                 ciflow/inductor/168955      -> ciflow/inductor/168955
2025-12-04T09:17:19.0437966Z  * [new tag]                 ciflow/inductor/168971      -> ciflow/inductor/168971
2025-12-04T09:17:19.0439281Z  * [new tag]                 ciflow/inductor/168979      -> ciflow/inductor/168979
2025-12-04T09:17:19.0440603Z  * [new tag]                 ciflow/inductor/168980      -> ciflow/inductor/168980
2025-12-04T09:17:19.0442067Z  * [new tag]                 ciflow/inductor/168983      -> ciflow/inductor/168983
2025-12-04T09:17:19.0443363Z  * [new tag]                 ciflow/inductor/169006      -> ciflow/inductor/169006
2025-12-04T09:17:19.0444754Z  * [new tag]                 ciflow/inductor/169023      -> ciflow/inductor/169023
2025-12-04T09:17:19.0446100Z  * [new tag]                 ciflow/inductor/169024      -> ciflow/inductor/169024
2025-12-04T09:17:19.0447450Z  * [new tag]                 ciflow/inductor/169025      -> ciflow/inductor/169025
2025-12-04T09:17:19.0448753Z  * [new tag]                 ciflow/inductor/169066      -> ciflow/inductor/169066
2025-12-04T09:17:19.0450076Z  * [new tag]                 ciflow/inductor/169091      -> ciflow/inductor/169091
2025-12-04T09:17:19.0451415Z  * [new tag]                 ciflow/inductor/169102      -> ciflow/inductor/169102
2025-12-04T09:17:19.0452705Z  * [new tag]                 ciflow/inductor/169103      -> ciflow/inductor/169103
2025-12-04T09:17:19.0454037Z  * [new tag]                 ciflow/inductor/169121      -> ciflow/inductor/169121
2025-12-04T09:17:19.0455348Z  * [new tag]                 ciflow/inductor/169134      -> ciflow/inductor/169134
2025-12-04T09:17:19.0456658Z  * [new tag]                 ciflow/inductor/169135      -> ciflow/inductor/169135
2025-12-04T09:17:19.0457947Z  * [new tag]                 ciflow/inductor/169141      -> ciflow/inductor/169141
2025-12-04T09:17:19.0459492Z  * [new tag]                 ciflow/inductor/169151      -> ciflow/inductor/169151
2025-12-04T09:17:19.0460997Z  * [new tag]                 ciflow/inductor/169161      -> ciflow/inductor/169161
2025-12-04T09:17:19.0462311Z  * [new tag]                 ciflow/inductor/169167      -> ciflow/inductor/169167
2025-12-04T09:17:19.0463802Z  * [new tag]                 ciflow/inductor/169177      -> ciflow/inductor/169177
2025-12-04T09:17:19.0465398Z  * [new tag]                 ciflow/inductor/169185      -> ciflow/inductor/169185
2025-12-04T09:17:19.0466638Z  * [new tag]                 ciflow/inductor/169196      -> ciflow/inductor/169196
2025-12-04T09:17:19.0467955Z  * [new tag]                 ciflow/inductor/169200      -> ciflow/inductor/169200
2025-12-04T09:17:19.0469261Z  * [new tag]                 ciflow/inductor/169204      -> ciflow/inductor/169204
2025-12-04T09:17:19.0470503Z  * [new tag]                 ciflow/inductor/169216      -> ciflow/inductor/169216
2025-12-04T09:17:19.0471916Z  * [new tag]                 ciflow/inductor/169219      -> ciflow/inductor/169219
2025-12-04T09:17:19.0473232Z  * [new tag]                 ciflow/inductor/169220      -> ciflow/inductor/169220
2025-12-04T09:17:19.0474674Z  * [new tag]                 ciflow/inductor/169230      -> ciflow/inductor/169230
2025-12-04T09:17:19.0475986Z  * [new tag]                 ciflow/inductor/169242      -> ciflow/inductor/169242
2025-12-04T09:17:19.0477309Z  * [new tag]                 ciflow/inductor/169245      -> ciflow/inductor/169245
2025-12-04T09:17:19.0478770Z  * [new tag]                 ciflow/inductor/169260      -> ciflow/inductor/169260
2025-12-04T09:17:19.0480114Z  * [new tag]                 ciflow/inductor/169282      -> ciflow/inductor/169282
2025-12-04T09:17:19.0481422Z  * [new tag]                 ciflow/inductor/169286      -> ciflow/inductor/169286
2025-12-04T09:17:19.0482728Z  * [new tag]                 ciflow/inductor/169299      -> ciflow/inductor/169299
2025-12-04T09:17:19.0484179Z  * [new tag]                 ciflow/inductor/169304      -> ciflow/inductor/169304
2025-12-04T09:17:19.0486413Z  * [new tag]                 ciflow/inductor/169305      -> ciflow/inductor/169305
2025-12-04T09:17:19.0487732Z  * [new tag]                 ciflow/inductor/169308      -> ciflow/inductor/169308
2025-12-04T09:17:19.0489056Z  * [new tag]                 ciflow/inductor/169319      -> ciflow/inductor/169319
2025-12-04T09:17:19.0490411Z  * [new tag]                 ciflow/inductor/169326      -> ciflow/inductor/169326
2025-12-04T09:17:19.0491723Z  * [new tag]                 ciflow/inductor/169332      -> ciflow/inductor/169332
2025-12-04T09:17:19.0493052Z  * [new tag]                 ciflow/inductor/169333      -> ciflow/inductor/169333
2025-12-04T09:17:19.0494571Z  * [new tag]                 ciflow/inductor/169336      -> ciflow/inductor/169336
2025-12-04T09:17:19.0495945Z  * [new tag]                 ciflow/inductor/169340      -> ciflow/inductor/169340
2025-12-04T09:17:19.0497266Z  * [new tag]                 ciflow/inductor/169341      -> ciflow/inductor/169341
2025-12-04T09:17:19.0498588Z  * [new tag]                 ciflow/inductor/169343      -> ciflow/inductor/169343
2025-12-04T09:17:19.0500025Z  * [new tag]                 ciflow/inductor/169346      -> ciflow/inductor/169346
2025-12-04T09:17:19.0501528Z  * [new tag]                 ciflow/inductor/169348      -> ciflow/inductor/169348
2025-12-04T09:17:19.0503230Z  * [new tag]                 ciflow/inductor/169350      -> ciflow/inductor/169350
2025-12-04T09:17:19.0504662Z  * [new tag]                 ciflow/inductor/169355      -> ciflow/inductor/169355
2025-12-04T09:17:19.0506023Z  * [new tag]                 ciflow/inductor/169370      -> ciflow/inductor/169370
2025-12-04T09:17:19.0507944Z  * [new tag]                 ciflow/inductor/169375      -> ciflow/inductor/169375
2025-12-04T09:17:19.0509236Z  * [new tag]                 ciflow/inductor/169389      -> ciflow/inductor/169389
2025-12-04T09:17:19.0510524Z  * [new tag]                 ciflow/inductor/169391      -> ciflow/inductor/169391
2025-12-04T09:17:19.0511828Z  * [new tag]                 ciflow/inductor/169393      -> ciflow/inductor/169393
2025-12-04T09:17:19.0513196Z  * [new tag]                 ciflow/inductor/169399      -> ciflow/inductor/169399
2025-12-04T09:17:19.0514649Z  * [new tag]                 ciflow/inductor/169400      -> ciflow/inductor/169400
2025-12-04T09:17:19.0515960Z  * [new tag]                 ciflow/inductor/169415      -> ciflow/inductor/169415
2025-12-04T09:17:19.0517452Z  * [new tag]                 ciflow/inductor/169417      -> ciflow/inductor/169417
2025-12-04T09:17:19.0518580Z  * [new tag]                 ciflow/inductor/169418      -> ciflow/inductor/169418
2025-12-04T09:17:19.0520211Z  * [new tag]                 ciflow/inductor/169430      -> ciflow/inductor/169430
2025-12-04T09:17:19.0521454Z  * [new tag]                 ciflow/inductor/169432      -> ciflow/inductor/169432
2025-12-04T09:17:19.0522893Z  * [new tag]                 ciflow/inductor/169436      -> ciflow/inductor/169436
2025-12-04T09:17:19.0524330Z  * [new tag]                 ciflow/inductor/169437      -> ciflow/inductor/169437
2025-12-04T09:17:19.0525681Z  * [new tag]                 ciflow/inductor/169438      -> ciflow/inductor/169438
2025-12-04T09:17:19.0527029Z  * [new tag]                 ciflow/inductor/169441      -> ciflow/inductor/169441
2025-12-04T09:17:19.0528339Z  * [new tag]                 ciflow/inductor/169446      -> ciflow/inductor/169446
2025-12-04T09:17:19.0529982Z  * [new tag]                 ciflow/inductor/169447      -> ciflow/inductor/169447
2025-12-04T09:17:19.0531313Z  * [new tag]                 ciflow/inductor/169452      -> ciflow/inductor/169452
2025-12-04T09:17:19.0532791Z  * [new tag]                 ciflow/inductor/169455      -> ciflow/inductor/169455
2025-12-04T09:17:19.0534132Z  * [new tag]                 ciflow/inductor/169459      -> ciflow/inductor/169459
2025-12-04T09:17:19.0535580Z  * [new tag]                 ciflow/inductor/169463      -> ciflow/inductor/169463
2025-12-04T09:17:19.0537066Z  * [new tag]                 ciflow/inductor/169476      -> ciflow/inductor/169476
2025-12-04T09:17:19.0538381Z  * [new tag]                 ciflow/inductor/169485      -> ciflow/inductor/169485
2025-12-04T09:17:19.0539847Z  * [new tag]                 ciflow/inductor/169493      -> ciflow/inductor/169493
2025-12-04T09:17:19.0541156Z  * [new tag]                 ciflow/inductor/169496      -> ciflow/inductor/169496
2025-12-04T09:17:19.0542453Z  * [new tag]                 ciflow/inductor/169497      -> ciflow/inductor/169497
2025-12-04T09:17:19.0543822Z  * [new tag]                 ciflow/inductor/169503      -> ciflow/inductor/169503
2025-12-04T09:17:19.0545168Z  * [new tag]                 ciflow/inductor/169504      -> ciflow/inductor/169504
2025-12-04T09:17:19.0546762Z  * [new tag]                 ciflow/inductor/169505      -> ciflow/inductor/169505
2025-12-04T09:17:19.0548489Z  * [new tag]                 ciflow/inductor/169508      -> ciflow/inductor/169508
2025-12-04T09:17:19.0549914Z  * [new tag]                 ciflow/inductor/169509      -> ciflow/inductor/169509
2025-12-04T09:17:19.0551345Z  * [new tag]                 ciflow/inductor/169513      -> ciflow/inductor/169513
2025-12-04T09:17:19.0552688Z  * [new tag]                 ciflow/inductor/169514      -> ciflow/inductor/169514
2025-12-04T09:17:19.0554010Z  * [new tag]                 ciflow/inductor/169515      -> ciflow/inductor/169515
2025-12-04T09:17:19.0555337Z  * [new tag]                 ciflow/inductor/169517      -> ciflow/inductor/169517
2025-12-04T09:17:19.0556673Z  * [new tag]                 ciflow/inductor/169519      -> ciflow/inductor/169519
2025-12-04T09:17:19.0558017Z  * [new tag]                 ciflow/inductor/169520      -> ciflow/inductor/169520
2025-12-04T09:17:19.0559360Z  * [new tag]                 ciflow/inductor/169521      -> ciflow/inductor/169521
2025-12-04T09:17:19.0560680Z  * [new tag]                 ciflow/inductor/169524      -> ciflow/inductor/169524
2025-12-04T09:17:19.0562063Z  * [new tag]                 ciflow/inductor/169527      -> ciflow/inductor/169527
2025-12-04T09:17:19.0563393Z  * [new tag]                 ciflow/inductor/169528      -> ciflow/inductor/169528
2025-12-04T09:17:19.0564840Z  * [new tag]                 ciflow/inductor/169532      -> ciflow/inductor/169532
2025-12-04T09:17:19.0566170Z  * [new tag]                 ciflow/inductor/169535      -> ciflow/inductor/169535
2025-12-04T09:17:19.0567494Z  * [new tag]                 ciflow/inductor/169536      -> ciflow/inductor/169536
2025-12-04T09:17:19.0568966Z  * [new tag]                 ciflow/inductor/169547      -> ciflow/inductor/169547
2025-12-04T09:17:19.0569905Z  * [new tag]                 ciflow/inductor/169548      -> ciflow/inductor/169548
2025-12-04T09:17:19.0571504Z  * [new tag]                 ciflow/inductor/169549      -> ciflow/inductor/169549
2025-12-04T09:17:19.0572880Z  * [new tag]                 ciflow/inductor/169551      -> ciflow/inductor/169551
2025-12-04T09:17:19.0574180Z  * [new tag]                 ciflow/inductor/169552      -> ciflow/inductor/169552
2025-12-04T09:17:19.0576034Z  * [new tag]                 ciflow/inductor/169553      -> ciflow/inductor/169553
2025-12-04T09:17:19.0577377Z  * [new tag]                 ciflow/inductor/169557      -> ciflow/inductor/169557
2025-12-04T09:17:19.0579103Z  * [new tag]                 ciflow/inductor/3b9a386     -> ciflow/inductor/3b9a386
2025-12-04T09:17:19.0580830Z  * [new tag]                 ciflow/inductor/3d4b92b     -> ciflow/inductor/3d4b92b
2025-12-04T09:17:19.0582315Z  * [new tag]                 ciflow/inductor/d224ac7     -> ciflow/inductor/d224ac7
2025-12-04T09:17:19.0583907Z  * [new tag]                 ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994
2025-12-04T09:17:19.0585024Z  * [new tag]                 ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075
2025-12-04T09:17:19.0586327Z  * [new tag]                 ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876
2025-12-04T09:17:19.0587446Z  * [new tag]                 ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981
2025-12-04T09:17:19.0589055Z  * [new tag]                 ciflow/mps/166254           -> ciflow/mps/166254
2025-12-04T09:17:19.0590422Z  * [new tag]                 ciflow/mps/169017           -> ciflow/mps/169017
2025-12-04T09:17:19.0591895Z  * [new tag]                 ciflow/mps/169372           -> ciflow/mps/169372
2025-12-04T09:17:19.0593124Z  * [new tag]                 ciflow/mps/169478           -> ciflow/mps/169478
2025-12-04T09:17:19.0594717Z  * [new tag]                 ciflow/op-benchmark/157994  -> ciflow/op-benchmark/157994
2025-12-04T09:17:19.0596456Z  * [new tag]                 ciflow/op-benchmark/166075  -> ciflow/op-benchmark/166075
2025-12-04T09:17:19.0597386Z  * [new tag]                 ciflow/op-benchmark/169544  -> ciflow/op-benchmark/169544
2025-12-04T09:17:19.0599248Z  * [new tag]                 ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997
2025-12-04T09:17:19.0600650Z  * [new tag]                 ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517
2025-12-04T09:17:19.0601827Z  * [new tag]                 ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063
2025-12-04T09:17:19.0603206Z  * [new tag]                 ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425
2025-12-04T09:17:19.0604732Z  * [new tag]                 ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517
2025-12-04T09:17:19.0606029Z  * [new tag]                 ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063
2025-12-04T09:17:19.0607022Z  * [new tag]                 ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425
2025-12-04T09:17:19.0609270Z  * [new tag]                 ciflow/periodic/054a2fd     -> ciflow/periodic/054a2fd
2025-12-04T09:17:19.0610390Z  * [new tag]                 ciflow/periodic/167207      -> ciflow/periodic/167207
2025-12-04T09:17:19.0611872Z  * [new tag]                 ciflow/periodic/167978      -> ciflow/periodic/167978
2025-12-04T09:17:19.0613105Z  * [new tag]                 ciflow/periodic/168096      -> ciflow/periodic/168096
2025-12-04T09:17:19.0614315Z  * [new tag]                 ciflow/periodic/169286      -> ciflow/periodic/169286
2025-12-04T09:17:19.0615772Z  * [new tag]                 ciflow/periodic/2a6d37d     -> ciflow/periodic/2a6d37d
2025-12-04T09:17:19.0617209Z  * [new tag]                 ciflow/periodic/317eeb8     -> ciflow/periodic/317eeb8
2025-12-04T09:17:19.0618747Z  * [new tag]                 ciflow/periodic/3c32        -> ciflow/periodic/3c32
2025-12-04T09:17:19.0620156Z  * [new tag]                 ciflow/periodic/3e98831     -> ciflow/periodic/3e98831
2025-12-04T09:17:19.0622254Z  * [new tag]                 ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T09:17:19.0623876Z  * [new tag]                 ciflow/periodic/94512-point -> ciflow/periodic/94512-point
2025-12-04T09:17:19.0625681Z  * [new tag]                 ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519
2025-12-04T09:17:19.0627172Z  * [new tag]                 ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275
2025-12-04T09:17:19.0628612Z  * [new tag]                 ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761
2025-12-04T09:17:19.0630187Z  * [new tag]                 ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12
2025-12-04T09:17:19.0631922Z  * [new tag]                 ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0
2025-12-04T09:17:19.0633493Z  * [new tag]                 ciflow/periodic/sha-ec5b83  -> ciflow/periodic/sha-ec5b83
2025-12-04T09:17:19.0634974Z  * [new tag]                 ciflow/pull/167207          -> ciflow/pull/167207
2025-12-04T09:17:19.0636846Z  * [new tag]                 ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207
2025-12-04T09:17:19.0638319Z  * [new tag]                 ciflow/rocm-mi200/165545    -> ciflow/rocm-mi200/165545
2025-12-04T09:17:19.0639545Z  * [new tag]                 ciflow/rocm-mi200/165997    -> ciflow/rocm-mi200/165997
2025-12-04T09:17:19.0640761Z  * [new tag]                 ciflow/rocm-mi200/168096    -> ciflow/rocm-mi200/168096
2025-12-04T09:17:19.0642187Z  * [new tag]                 ciflow/rocm-mi200/168275    -> ciflow/rocm-mi200/168275
2025-12-04T09:17:19.0643414Z  * [new tag]                 ciflow/rocm-mi200/169063    -> ciflow/rocm-mi200/169063
2025-12-04T09:17:19.0644808Z  * [new tag]                 ciflow/rocm-mi200/169356    -> ciflow/rocm-mi200/169356
2025-12-04T09:17:19.0645898Z  * [new tag]                 ciflow/rocm-mi200/169425    -> ciflow/rocm-mi200/169425
2025-12-04T09:17:19.0647549Z  * [new tag]                 ciflow/rocm-mi300/165545    -> ciflow/rocm-mi300/165545
2025-12-04T09:17:19.0649006Z  * [new tag]                 ciflow/rocm-mi300/167157    -> ciflow/rocm-mi300/167157
2025-12-04T09:17:19.0650228Z  * [new tag]                 ciflow/rocm-mi300/168096    -> ciflow/rocm-mi300/168096
2025-12-04T09:17:19.0651455Z  * [new tag]                 ciflow/rocm-mi300/169063    -> ciflow/rocm-mi300/169063
2025-12-04T09:17:19.0652537Z  * [new tag]                 ciflow/rocm-mi300/169425    -> ciflow/rocm-mi300/169425
2025-12-04T09:17:19.0654175Z  * [new tag]                 ciflow/rocm-mi355/167157    -> ciflow/rocm-mi355/167157
2025-12-04T09:17:19.0655501Z  * [new tag]                 ciflow/rocm-mi355/168275    -> ciflow/rocm-mi355/168275
2025-12-04T09:17:19.0656737Z  * [new tag]                 ciflow/rocm-mi355/169425    -> ciflow/rocm-mi355/169425
2025-12-04T09:17:19.0658302Z  * [new tag]                 ciflow/rocm-navi31/168275   -> ciflow/rocm-navi31/168275
2025-12-04T09:17:19.0659610Z  * [new tag]                 ciflow/rocm-navi31/169425   -> ciflow/rocm-navi31/169425
2025-12-04T09:17:19.0661121Z  * [new tag]                 ciflow/rocm/115316          -> ciflow/rocm/115316
2025-12-04T09:17:19.0662348Z  * [new tag]                 ciflow/rocm/148492          -> ciflow/rocm/148492
2025-12-04T09:17:19.0663583Z  * [new tag]                 ciflow/rocm/160685          -> ciflow/rocm/160685
2025-12-04T09:17:19.0664808Z  * [new tag]                 ciflow/rocm/161607          -> ciflow/rocm/161607
2025-12-04T09:17:19.0666108Z  * [new tag]                 ciflow/rocm/162052          -> ciflow/rocm/162052
2025-12-04T09:17:19.0667332Z  * [new tag]                 ciflow/rocm/165997          -> ciflow/rocm/165997
2025-12-04T09:17:19.0668697Z  * [new tag]                 ciflow/rocm/166165          -> ciflow/rocm/166165
2025-12-04T09:17:19.0669632Z  * [new tag]                 ciflow/rocm/166517          -> ciflow/rocm/166517
2025-12-04T09:17:19.0671082Z  * [new tag]                 ciflow/rocm/167207          -> ciflow/rocm/167207
2025-12-04T09:17:19.0672316Z  * [new tag]                 ciflow/rocm/167536          -> ciflow/rocm/167536
2025-12-04T09:17:19.0673314Z  * [new tag]                 ciflow/rocm/167781          -> ciflow/rocm/167781
2025-12-04T09:17:19.0675126Z  * [new tag]                 ciflow/rocm/167989          -> ciflow/rocm/167989
2025-12-04T09:17:19.0676818Z  * [new tag]                 ciflow/rocm/168073          -> ciflow/rocm/168073
2025-12-04T09:17:19.0678368Z  * [new tag]                 ciflow/rocm/168195          -> ciflow/rocm/168195
2025-12-04T09:17:19.0679706Z  * [new tag]                 ciflow/rocm/168939          -> ciflow/rocm/168939
2025-12-04T09:17:19.0681001Z  * [new tag]                 ciflow/rocm/168971          -> ciflow/rocm/168971
2025-12-04T09:17:19.0682309Z  * [new tag]                 ciflow/rocm/169024          -> ciflow/rocm/169024
2025-12-04T09:17:19.0683597Z  * [new tag]                 ciflow/rocm/169200          -> ciflow/rocm/169200
2025-12-04T09:17:19.0684880Z  * [new tag]                 ciflow/rocm/169216          -> ciflow/rocm/169216
2025-12-04T09:17:19.0686183Z  * [new tag]                 ciflow/rocm/169312          -> ciflow/rocm/169312
2025-12-04T09:17:19.0687492Z  * [new tag]                 ciflow/rocm/169380          -> ciflow/rocm/169380
2025-12-04T09:17:19.0688858Z  * [new tag]                 ciflow/rocm/169427          -> ciflow/rocm/169427
2025-12-04T09:17:19.0690163Z  * [new tag]                 ciflow/rocm/169455          -> ciflow/rocm/169455
2025-12-04T09:17:19.0691439Z  * [new tag]                 ciflow/rocm/169470          -> ciflow/rocm/169470
2025-12-04T09:17:19.0692734Z  * [new tag]                 ciflow/rocm/169471          -> ciflow/rocm/169471
2025-12-04T09:17:19.0694048Z  * [new tag]                 ciflow/rocm/169472          -> ciflow/rocm/169472
2025-12-04T09:17:19.0695350Z  * [new tag]                 ciflow/rocm/169514          -> ciflow/rocm/169514
2025-12-04T09:17:19.0697053Z  * [new tag]                 ciflow/slow/01c7106         -> ciflow/slow/01c7106
2025-12-04T09:17:19.0698411Z  * [new tag]                 ciflow/slow/0577043         -> ciflow/slow/0577043
2025-12-04T09:17:19.0700455Z  * [new tag]                 ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym
2025-12-04T09:17:19.0701262Z  * [new tag]                 ciflow/slow/0e81104         -> ciflow/slow/0e81104
2025-12-04T09:17:19.0702726Z  * [new tag]                 ciflow/slow/167207          -> ciflow/slow/167207
2025-12-04T09:17:19.0704491Z  * [new tag]                 ciflow/slow/168050          -> ciflow/slow/168050
2025-12-04T09:17:19.0705927Z  * [new tag]                 ciflow/slow/1732077         -> ciflow/slow/1732077
2025-12-04T09:17:19.0707433Z  * [new tag]                 ciflow/slow/187eb7c         -> ciflow/slow/187eb7c
2025-12-04T09:17:19.0711509Z  * [new tag]                 ciflow/slow/1faef89         -> ciflow/slow/1faef89
2025-12-04T09:17:19.0713284Z  * [new tag]                 ciflow/slow/3920ec1         -> ciflow/slow/3920ec1
2025-12-04T09:17:19.0714944Z  * [new tag]                 ciflow/slow/3b7c6b2         -> ciflow/slow/3b7c6b2
2025-12-04T09:17:19.0716434Z  * [new tag]                 ciflow/slow/59a3759         -> ciflow/slow/59a3759
2025-12-04T09:17:19.0723966Z  * [new tag]                 ciflow/slow/70ef0bb         -> ciflow/slow/70ef0bb
2025-12-04T09:17:19.0724367Z  * [new tag]                 ciflow/slow/788ff06         -> ciflow/slow/788ff06
2025-12-04T09:17:19.0724961Z  * [new tag]                 ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym
2025-12-04T09:17:19.0725141Z  * [new tag]                 ciflow/slow/9d85864         -> ciflow/slow/9d85864
2025-12-04T09:17:19.0725471Z  * [new tag]                 ciflow/slow/9ffad5b         -> ciflow/slow/9ffad5b
2025-12-04T09:17:19.0725648Z  * [new tag]                 ciflow/slow/a206e8b         -> ciflow/slow/a206e8b
2025-12-04T09:17:19.0726808Z  * [new tag]                 ciflow/slow/a837609         -> ciflow/slow/a837609
2025-12-04T09:17:19.0728443Z  * [new tag]                 ciflow/slow/af841f3         -> ciflow/slow/af841f3
2025-12-04T09:17:19.0730448Z  * [new tag]                 ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym
2025-12-04T09:17:19.0731564Z  * [new tag]                 ciflow/torchbench/168175    -> ciflow/torchbench/168175
2025-12-04T09:17:19.0733237Z  * [new tag]                 ciflow/trunk/148492         -> ciflow/trunk/148492
2025-12-04T09:17:19.0734320Z  * [new tag]                 ciflow/trunk/157149         -> ciflow/trunk/157149
2025-12-04T09:17:19.0735675Z  * [new tag]                 ciflow/trunk/157994         -> ciflow/trunk/157994
2025-12-04T09:17:19.0736879Z  * [new tag]                 ciflow/trunk/159718         -> ciflow/trunk/159718
2025-12-04T09:17:19.0738156Z  * [new tag]                 ciflow/trunk/160685         -> ciflow/trunk/160685
2025-12-04T09:17:19.0739286Z  * [new tag]                 ciflow/trunk/160729         -> ciflow/trunk/160729
2025-12-04T09:17:19.0740720Z  * [new tag]                 ciflow/trunk/162275         -> ciflow/trunk/162275
2025-12-04T09:17:19.0742431Z  * [new tag]                 ciflow/trunk/162795         -> ciflow/trunk/162795
2025-12-04T09:17:19.0743735Z  * [new tag]                 ciflow/trunk/163245         -> ciflow/trunk/163245
2025-12-04T09:17:19.0744813Z  * [new tag]                 ciflow/trunk/163942         -> ciflow/trunk/163942
2025-12-04T09:17:19.0746181Z  * [new tag]                 ciflow/trunk/165274         -> ciflow/trunk/165274
2025-12-04T09:17:19.0747919Z  * [new tag]                 ciflow/trunk/165483         -> ciflow/trunk/165483
2025-12-04T09:17:19.0749632Z  * [new tag]                 ciflow/trunk/165728         -> ciflow/trunk/165728
2025-12-04T09:17:19.0751088Z  * [new tag]                 ciflow/trunk/165922         -> ciflow/trunk/165922
2025-12-04T09:17:19.0752415Z  * [new tag]                 ciflow/trunk/166075         -> ciflow/trunk/166075
2025-12-04T09:17:19.0753725Z  * [new tag]                 ciflow/trunk/166165         -> ciflow/trunk/166165
2025-12-04T09:17:19.0755247Z  * [new tag]                 ciflow/trunk/166829         -> ciflow/trunk/166829
2025-12-04T09:17:19.0756667Z  * [new tag]                 ciflow/trunk/166843         -> ciflow/trunk/166843
2025-12-04T09:17:19.0757991Z  * [new tag]                 ciflow/trunk/166876         -> ciflow/trunk/166876
2025-12-04T09:17:19.0759301Z  * [new tag]                 ciflow/trunk/167207         -> ciflow/trunk/167207
2025-12-04T09:17:19.0760679Z  * [new tag]                 ciflow/trunk/167536         -> ciflow/trunk/167536
2025-12-04T09:17:19.0761942Z  * [new tag]                 ciflow/trunk/167552         -> ciflow/trunk/167552
2025-12-04T09:17:19.0763270Z  * [new tag]                 ciflow/trunk/167555         -> ciflow/trunk/167555
2025-12-04T09:17:19.0764650Z  * [new tag]                 ciflow/trunk/167599         -> ciflow/trunk/167599
2025-12-04T09:17:19.0766051Z  * [new tag]                 ciflow/trunk/167659         -> ciflow/trunk/167659
2025-12-04T09:17:19.0767444Z  * [new tag]                 ciflow/trunk/167672         -> ciflow/trunk/167672
2025-12-04T09:17:19.0768768Z  * [new tag]                 ciflow/trunk/167742         -> ciflow/trunk/167742
2025-12-04T09:17:19.0770080Z  * [new tag]                 ciflow/trunk/167781         -> ciflow/trunk/167781
2025-12-04T09:17:19.0771623Z  * [new tag]                 ciflow/trunk/167837         -> ciflow/trunk/167837
2025-12-04T09:17:19.0772898Z  * [new tag]                 ciflow/trunk/167887         -> ciflow/trunk/167887
2025-12-04T09:17:19.0774212Z  * [new tag]                 ciflow/trunk/167978         -> ciflow/trunk/167978
2025-12-04T09:17:19.0775659Z  * [new tag]                 ciflow/trunk/168050         -> ciflow/trunk/168050
2025-12-04T09:17:19.0776926Z  * [new tag]                 ciflow/trunk/168051         -> ciflow/trunk/168051
2025-12-04T09:17:19.0778185Z  * [new tag]                 ciflow/trunk/168096         -> ciflow/trunk/168096
2025-12-04T09:17:19.0779598Z  * [new tag]                 ciflow/trunk/168127         -> ciflow/trunk/168127
2025-12-04T09:17:19.0780949Z  * [new tag]                 ciflow/trunk/168157         -> ciflow/trunk/168157
2025-12-04T09:17:19.0782276Z  * [new tag]                 ciflow/trunk/168175         -> ciflow/trunk/168175
2025-12-04T09:17:19.0783546Z  * [new tag]                 ciflow/trunk/168209         -> ciflow/trunk/168209
2025-12-04T09:17:19.0785015Z  * [new tag]                 ciflow/trunk/168213         -> ciflow/trunk/168213
2025-12-04T09:17:19.0786478Z  * [new tag]                 ciflow/trunk/168226         -> ciflow/trunk/168226
2025-12-04T09:17:19.0787899Z  * [new tag]                 ciflow/trunk/168262         -> ciflow/trunk/168262
2025-12-04T09:17:19.0789136Z  * [new tag]                 ciflow/trunk/168275         -> ciflow/trunk/168275
2025-12-04T09:17:19.0790569Z  * [new tag]                 ciflow/trunk/168328         -> ciflow/trunk/168328
2025-12-04T09:17:19.0791884Z  * [new tag]                 ciflow/trunk/168368         -> ciflow/trunk/168368
2025-12-04T09:17:19.0793202Z  * [new tag]                 ciflow/trunk/168917         -> ciflow/trunk/168917
2025-12-04T09:17:19.0794527Z  * [new tag]                 ciflow/trunk/168933         -> ciflow/trunk/168933
2025-12-04T09:17:19.0796029Z  * [new tag]                 ciflow/trunk/168941         -> ciflow/trunk/168941
2025-12-04T09:17:19.0797345Z  * [new tag]                 ciflow/trunk/168955         -> ciflow/trunk/168955
2025-12-04T09:17:19.0798772Z  * [new tag]                 ciflow/trunk/168980         -> ciflow/trunk/168980
2025-12-04T09:17:19.0800321Z  * [new tag]                 ciflow/trunk/169004         -> ciflow/trunk/169004
2025-12-04T09:17:19.0801609Z  * [new tag]                 ciflow/trunk/169006         -> ciflow/trunk/169006
2025-12-04T09:17:19.0802921Z  * [new tag]                 ciflow/trunk/169023         -> ciflow/trunk/169023
2025-12-04T09:17:19.0804252Z  * [new tag]                 ciflow/trunk/169025         -> ciflow/trunk/169025
2025-12-04T09:17:19.0805577Z  * [new tag]                 ciflow/trunk/169048         -> ciflow/trunk/169048
2025-12-04T09:17:19.0806905Z  * [new tag]                 ciflow/trunk/169066         -> ciflow/trunk/169066
2025-12-04T09:17:19.0808406Z  * [new tag]                 ciflow/trunk/169091         -> ciflow/trunk/169091
2025-12-04T09:17:19.0809797Z  * [new tag]                 ciflow/trunk/169102         -> ciflow/trunk/169102
2025-12-04T09:17:19.0811070Z  * [new tag]                 ciflow/trunk/169103         -> ciflow/trunk/169103
2025-12-04T09:17:19.0812524Z  * [new tag]                 ciflow/trunk/169125         -> ciflow/trunk/169125
2025-12-04T09:17:19.0814015Z  * [new tag]                 ciflow/trunk/169139         -> ciflow/trunk/169139
2025-12-04T09:17:19.0815433Z  * [new tag]                 ciflow/trunk/169148         -> ciflow/trunk/169148
2025-12-04T09:17:19.0816766Z  * [new tag]                 ciflow/trunk/169151         -> ciflow/trunk/169151
2025-12-04T09:17:19.0818164Z  * [new tag]                 ciflow/trunk/169156         -> ciflow/trunk/169156
2025-12-04T09:17:19.0819689Z  * [new tag]                 ciflow/trunk/169176         -> ciflow/trunk/169176
2025-12-04T09:17:19.0821034Z  * [new tag]                 ciflow/trunk/169204         -> ciflow/trunk/169204
2025-12-04T09:17:19.0822328Z  * [new tag]                 ciflow/trunk/169207         -> ciflow/trunk/169207
2025-12-04T09:17:19.0823650Z  * [new tag]                 ciflow/trunk/169211         -> ciflow/trunk/169211
2025-12-04T09:17:19.0825199Z  * [new tag]                 ciflow/trunk/169231         -> ciflow/trunk/169231
2025-12-04T09:17:19.0826692Z  * [new tag]                 ciflow/trunk/169260         -> ciflow/trunk/169260
2025-12-04T09:17:19.0828165Z  * [new tag]                 ciflow/trunk/169271         -> ciflow/trunk/169271
2025-12-04T09:17:19.0829477Z  * [new tag]                 ciflow/trunk/169280         -> ciflow/trunk/169280
2025-12-04T09:17:19.0831484Z  * [new tag]                 ciflow/trunk/169281         -> ciflow/trunk/169281
2025-12-04T09:17:19.0832727Z  * [new tag]                 ciflow/trunk/169286         -> ciflow/trunk/169286
2025-12-04T09:17:19.0834341Z  * [new tag]                 ciflow/trunk/169293         -> ciflow/trunk/169293
2025-12-04T09:17:19.0835666Z  * [new tag]                 ciflow/trunk/169296         -> ciflow/trunk/169296
2025-12-04T09:17:19.0837052Z  * [new tag]                 ciflow/trunk/169304         -> ciflow/trunk/169304
2025-12-04T09:17:19.0838379Z  * [new tag]                 ciflow/trunk/169305         -> ciflow/trunk/169305
2025-12-04T09:17:19.0839716Z  * [new tag]                 ciflow/trunk/169312         -> ciflow/trunk/169312
2025-12-04T09:17:19.0841296Z  * [new tag]                 ciflow/trunk/169328         -> ciflow/trunk/169328
2025-12-04T09:17:19.0842605Z  * [new tag]                 ciflow/trunk/169343         -> ciflow/trunk/169343
2025-12-04T09:17:19.0844028Z  * [new tag]                 ciflow/trunk/169355         -> ciflow/trunk/169355
2025-12-04T09:17:19.0845349Z  * [new tag]                 ciflow/trunk/169370         -> ciflow/trunk/169370
2025-12-04T09:17:19.0846805Z  * [new tag]                 ciflow/trunk/169379         -> ciflow/trunk/169379
2025-12-04T09:17:19.0848164Z  * [new tag]                 ciflow/trunk/169380         -> ciflow/trunk/169380
2025-12-04T09:17:19.0849465Z  * [new tag]                 ciflow/trunk/169385         -> ciflow/trunk/169385
2025-12-04T09:17:19.0850790Z  * [new tag]                 ciflow/trunk/169387         -> ciflow/trunk/169387
2025-12-04T09:17:19.0852295Z  * [new tag]                 ciflow/trunk/169410         -> ciflow/trunk/169410
2025-12-04T09:17:19.0853664Z  * [new tag]                 ciflow/trunk/169412         -> ciflow/trunk/169412
2025-12-04T09:17:19.0854973Z  * [new tag]                 ciflow/trunk/169418         -> ciflow/trunk/169418
2025-12-04T09:17:19.0856272Z  * [new tag]                 ciflow/trunk/169423         -> ciflow/trunk/169423
2025-12-04T09:17:19.0857639Z  * [new tag]                 ciflow/trunk/169427         -> ciflow/trunk/169427
2025-12-04T09:17:19.0859015Z  * [new tag]                 ciflow/trunk/169430         -> ciflow/trunk/169430
2025-12-04T09:17:19.0860373Z  * [new tag]                 ciflow/trunk/169437         -> ciflow/trunk/169437
2025-12-04T09:17:19.0861706Z  * [new tag]                 ciflow/trunk/169442         -> ciflow/trunk/169442
2025-12-04T09:17:19.0863026Z  * [new tag]                 ciflow/trunk/169452         -> ciflow/trunk/169452
2025-12-04T09:17:19.0864333Z  * [new tag]                 ciflow/trunk/169454         -> ciflow/trunk/169454
2025-12-04T09:17:19.0865640Z  * [new tag]                 ciflow/trunk/169459         -> ciflow/trunk/169459
2025-12-04T09:17:19.0867088Z  * [new tag]                 ciflow/trunk/169474         -> ciflow/trunk/169474
2025-12-04T09:17:19.0868444Z  * [new tag]                 ciflow/trunk/169475         -> ciflow/trunk/169475
2025-12-04T09:17:19.0869739Z  * [new tag]                 ciflow/trunk/169476         -> ciflow/trunk/169476
2025-12-04T09:17:19.0871237Z  * [new tag]                 ciflow/trunk/169487         -> ciflow/trunk/169487
2025-12-04T09:17:19.0872547Z  * [new tag]                 ciflow/trunk/169497         -> ciflow/trunk/169497
2025-12-04T09:17:19.0873883Z  * [new tag]                 ciflow/trunk/169503         -> ciflow/trunk/169503
2025-12-04T09:17:19.0875195Z  * [new tag]                 ciflow/trunk/169505         -> ciflow/trunk/169505
2025-12-04T09:17:19.0876571Z  * [new tag]                 ciflow/trunk/169507         -> ciflow/trunk/169507
2025-12-04T09:17:19.0877845Z  * [new tag]                 ciflow/trunk/169514         -> ciflow/trunk/169514
2025-12-04T09:17:19.0879309Z  * [new tag]                 ciflow/trunk/169517         -> ciflow/trunk/169517
2025-12-04T09:17:19.0880552Z  * [new tag]                 ciflow/trunk/169519         -> ciflow/trunk/169519
2025-12-04T09:17:19.0881842Z  * [new tag]                 ciflow/trunk/169528         -> ciflow/trunk/169528
2025-12-04T09:17:19.0883065Z  * [new tag]                 ciflow/trunk/169541         -> ciflow/trunk/169541
2025-12-04T09:17:19.0884610Z  * [new tag]                 ciflow/trunk/169555         -> ciflow/trunk/169555
2025-12-04T09:17:19.0886471Z  * [new tag]                 ciflow/unstable/123         -> ciflow/unstable/123
2025-12-04T09:17:19.0888057Z  * [new tag]                 ciflow/vllm/165270          -> ciflow/vllm/165270
2025-12-04T09:17:19.0889314Z  * [new tag]                 ciflow/vllm/165274          -> ciflow/vllm/165274
2025-12-04T09:17:19.0890565Z  * [new tag]                 ciflow/vllm/166494          -> ciflow/vllm/166494
2025-12-04T09:17:19.0891831Z  * [new tag]                 ciflow/vllm/169219          -> ciflow/vllm/169219
2025-12-04T09:17:19.0893060Z  * [new tag]                 ciflow/vllm/169220          -> ciflow/vllm/169220
2025-12-04T09:17:19.0894648Z  * [new tag]                 ciflow/xpu/157994           -> ciflow/xpu/157994
2025-12-04T09:17:19.0895901Z  * [new tag]                 ciflow/xpu/159718           -> ciflow/xpu/159718
2025-12-04T09:17:19.0897215Z  * [new tag]                 ciflow/xpu/161940           -> ciflow/xpu/161940
2025-12-04T09:17:19.0898554Z  * [new tag]                 ciflow/xpu/163251           -> ciflow/xpu/163251
2025-12-04T09:17:19.0899911Z  * [new tag]                 ciflow/xpu/166829           -> ciflow/xpu/166829
2025-12-04T09:17:19.0901125Z  * [new tag]                 ciflow/xpu/166843           -> ciflow/xpu/166843
2025-12-04T09:17:19.0902434Z  * [new tag]                 ciflow/xpu/167972           -> ciflow/xpu/167972
2025-12-04T09:17:19.0903506Z  * [new tag]                 ciflow/xpu/167981           -> ciflow/xpu/167981
2025-12-04T09:17:19.0904863Z  * [new tag]                 ciflow/xpu/168213           -> ciflow/xpu/168213
2025-12-04T09:17:19.0906118Z  * [new tag]                 ciflow/xpu/168262           -> ciflow/xpu/168262
2025-12-04T09:17:19.0907420Z  * [new tag]                 ciflow/xpu/168328           -> ciflow/xpu/168328
2025-12-04T09:17:19.0909237Z  * [new tag]                 ciflow/xpu/168950           -> ciflow/xpu/168950
2025-12-04T09:17:19.0910984Z  * [new tag]                 ciflow/xpu/169039           -> ciflow/xpu/169039
2025-12-04T09:17:19.0912510Z  * [new tag]                 ciflow/xpu/169200           -> ciflow/xpu/169200
2025-12-04T09:17:19.0913899Z  * [new tag]                 ciflow/xpu/169203           -> ciflow/xpu/169203
2025-12-04T09:17:19.0915175Z  * [new tag]                 ciflow/xpu/169230           -> ciflow/xpu/169230
2025-12-04T09:17:19.0916494Z  * [new tag]                 ciflow/xpu/169231           -> ciflow/xpu/169231
2025-12-04T09:17:19.0917957Z  * [new tag]                 ciflow/xpu/169241           -> ciflow/xpu/169241
2025-12-04T09:17:19.0919322Z  * [new tag]                 ciflow/xpu/169280           -> ciflow/xpu/169280
2025-12-04T09:17:19.0920664Z  * [new tag]                 ciflow/xpu/169296           -> ciflow/xpu/169296
2025-12-04T09:17:19.0922124Z  * [new tag]                 ciflow/xpu/169353           -> ciflow/xpu/169353
2025-12-04T09:17:19.0923460Z  * [new tag]                 ciflow/xpu/169410           -> ciflow/xpu/169410
2025-12-04T09:17:19.0924802Z  * [new tag]                 ciflow/xpu/169442           -> ciflow/xpu/169442
2025-12-04T09:17:19.0926159Z  * [new tag]                 ciflow/xpu/169555           -> ciflow/xpu/169555
2025-12-04T09:17:19.0927631Z  * [new tag]                 cslpull75                   -> cslpull75
2025-12-04T09:17:19.0929164Z  * [new tag]                 cslpull76                   -> cslpull76
2025-12-04T09:17:19.0930528Z  * [new tag]                 cslpull77                   -> cslpull77
2025-12-04T09:17:19.0931950Z  * [new tag]                 cslpull78                   -> cslpull78
2025-12-04T09:17:19.0933421Z  * [new tag]                 cslpull79                   -> cslpull79
2025-12-04T09:17:19.0935181Z  * [new tag]                 cslpull80                   -> cslpull80
2025-12-04T09:17:19.0936629Z  * [new tag]                 cslpull81                   -> cslpull81
2025-12-04T09:17:19.0938027Z  * [new tag]                 cslpull82                   -> cslpull82
2025-12-04T09:17:19.0939522Z  * [new tag]                 cslpull83                   -> cslpull83
2025-12-04T09:17:19.0940903Z  * [new tag]                 cslpull84                   -> cslpull84
2025-12-04T09:17:19.0942248Z  * [new tag]                 cslpull85                   -> cslpull85
2025-12-04T09:17:19.0943661Z  * [new tag]                 cslpull86                   -> cslpull86
2025-12-04T09:17:19.0945026Z  * [new tag]                 cslpull87                   -> cslpull87
2025-12-04T09:17:19.0946458Z  * [new tag]                 cslpull88                   -> cslpull88
2025-12-04T09:17:19.0947878Z  * [new tag]                 cslpull89                   -> cslpull89
2025-12-04T09:17:19.0949053Z  * [new tag]                 cslpull90                   -> cslpull90
2025-12-04T09:17:19.0950863Z  * [new tag]                 cslpull91                   -> cslpull91
2025-12-04T09:17:19.0952205Z  * [new tag]                 cslpull92                   -> cslpull92
2025-12-04T09:17:19.0953754Z  * [new tag]                 flight_5                    -> flight_5
2025-12-04T09:17:19.0955293Z  * [new tag]                 flight_5.1                  -> flight_5.1
2025-12-04T09:17:19.0956698Z  * [new tag]                 flight_5.2                  -> flight_5.2
2025-12-04T09:17:19.0958136Z  * [new tag]                 flight_5.3                  -> flight_5.3
2025-12-04T09:17:19.0959606Z  * [new tag]                 forpull1                    -> forpull1
2025-12-04T09:17:19.0961248Z  * [new tag]                 malfet/tag-2ef5611          -> malfet/tag-2ef5611
2025-12-04T09:17:19.0962757Z  * [new tag]                 malfet/tag-317b1a0          -> malfet/tag-317b1a0
2025-12-04T09:17:19.0964120Z  * [new tag]                 malfet/tag-ec6f767          -> malfet/tag-ec6f767
2025-12-04T09:17:19.0965910Z  * [new tag]                 nightly-binary              -> nightly-binary
2025-12-04T09:17:19.0967116Z  * [new tag]                 sqzhang_flight4_plus        -> sqzhang_flight4_plus
2025-12-04T09:17:19.0968623Z  * [new tag]                 sqzhang_flight_3            -> sqzhang_flight_3
2025-12-04T09:17:19.0970502Z  * [new tag]                 trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272
2025-12-04T09:17:19.0972043Z  * [new tag]                 trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e
2025-12-04T09:17:19.0973755Z  * [new tag]                 trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88
2025-12-04T09:17:19.0975436Z  * [new tag]                 trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654
2025-12-04T09:17:19.0976758Z  * [new tag]                 trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb
2025-12-04T09:17:19.0978336Z  * [new tag]                 trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b
2025-12-04T09:17:19.0979877Z  * [new tag]                 trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672
2025-12-04T09:17:19.0981284Z  * [new tag]                 trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75
2025-12-04T09:17:19.0982927Z  * [new tag]                 trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2
2025-12-04T09:17:19.0984512Z  * [new tag]                 trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5
2025-12-04T09:17:19.0985656Z  * [new tag]                 trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688
2025-12-04T09:17:19.0987358Z  * [new tag]                 trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10
2025-12-04T09:17:19.0988777Z  * [new tag]                 trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec
2025-12-04T09:17:19.0990368Z  * [new tag]                 trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f
2025-12-04T09:17:19.0992016Z  * [new tag]                 trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11
2025-12-04T09:17:19.0993507Z  * [new tag]                 trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63
2025-12-04T09:17:19.0995001Z  * [new tag]                 trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5
2025-12-04T09:17:19.0996459Z  * [new tag]                 trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676
2025-12-04T09:17:19.0997689Z  * [new tag]                 trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e
2025-12-04T09:17:19.0999249Z  * [new tag]                 trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520
2025-12-04T09:17:19.1000782Z  * [new tag]                 trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8
2025-12-04T09:17:19.1002311Z  * [new tag]                 trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c
2025-12-04T09:17:19.1003373Z  * [new tag]                 trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d
2025-12-04T09:17:19.1005101Z  * [new tag]                 trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8
2025-12-04T09:17:19.1006669Z  * [new tag]                 trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de
2025-12-04T09:17:19.1008468Z  * [new tag]                 trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543
2025-12-04T09:17:19.1011376Z  * [new tag]                 trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7
2025-12-04T09:17:19.1012681Z  * [new tag]                 trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f
2025-12-04T09:17:19.1014415Z  * [new tag]                 trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9
2025-12-04T09:17:19.1015504Z  * [new tag]                 trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87
2025-12-04T09:17:19.1017209Z  * [new tag]                 trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc
2025-12-04T09:17:19.1018654Z  * [new tag]                 trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25
2025-12-04T09:17:19.1020229Z  * [new tag]                 trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563
2025-12-04T09:17:19.1021800Z  * [new tag]                 trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf
2025-12-04T09:17:19.1022867Z  * [new tag]                 trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd
2025-12-04T09:17:19.1025176Z  * [new tag]                 trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1
2025-12-04T09:17:19.1026773Z  * [new tag]                 trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708
2025-12-04T09:17:19.1028098Z  * [new tag]                 trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35
2025-12-04T09:17:19.1029628Z  * [new tag]                 trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec
2025-12-04T09:17:19.1031222Z  * [new tag]                 trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94
2025-12-04T09:17:19.1032519Z  * [new tag]                 trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1
2025-12-04T09:17:19.1034082Z  * [new tag]                 trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48
2025-12-04T09:17:19.1035401Z  * [new tag]                 trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991
2025-12-04T09:17:19.1037004Z  * [new tag]                 trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8
2025-12-04T09:17:19.1038315Z  * [new tag]                 trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf
2025-12-04T09:17:19.1039901Z  * [new tag]                 trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee
2025-12-04T09:17:19.1041448Z  * [new tag]                 trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938
2025-12-04T09:17:19.1042477Z  * [new tag]                 trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725
2025-12-04T09:17:19.1044125Z  * [new tag]                 trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae
2025-12-04T09:17:19.1045603Z  * [new tag]                 trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f
2025-12-04T09:17:19.1047073Z  * [new tag]                 trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c
2025-12-04T09:17:19.1048510Z  * [new tag]                 trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247
2025-12-04T09:17:19.1049939Z  * [new tag]                 trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c
2025-12-04T09:17:19.1051236Z  * [new tag]                 trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a
2025-12-04T09:17:19.1052739Z  * [new tag]                 trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686
2025-12-04T09:17:19.1054464Z  * [new tag]                 trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54
2025-12-04T09:17:19.1055522Z  * [new tag]                 trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af
2025-12-04T09:17:19.1057274Z  * [new tag]                 trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96
2025-12-04T09:17:19.1058764Z  * [new tag]                 trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873
2025-12-04T09:17:19.1060299Z  * [new tag]                 trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985
2025-12-04T09:17:19.1061369Z  * [new tag]                 trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f
2025-12-04T09:17:19.1063251Z  * [new tag]                 trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa
2025-12-04T09:17:19.1064841Z  * [new tag]                 trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c
2025-12-04T09:17:19.1066715Z  * [new tag]                 trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a
2025-12-04T09:17:19.1068166Z  * [new tag]                 trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d
2025-12-04T09:17:19.1069836Z  * [new tag]                 trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9
2025-12-04T09:17:19.1071299Z  * [new tag]                 trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3
2025-12-04T09:17:19.1072848Z  * [new tag]                 trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a
2025-12-04T09:17:19.1074365Z  * [new tag]                 trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292
2025-12-04T09:17:19.1075888Z  * [new tag]                 trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28
2025-12-04T09:17:19.1077432Z  * [new tag]                 trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66
2025-12-04T09:17:19.1078905Z  * [new tag]                 trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96
2025-12-04T09:17:19.1079975Z  * [new tag]                 trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc
2025-12-04T09:17:19.1081911Z  * [new tag]                 trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068
2025-12-04T09:17:19.1083425Z  * [new tag]                 trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd
2025-12-04T09:17:19.1084514Z  * [new tag]                 trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6
2025-12-04T09:17:19.1086406Z  * [new tag]                 trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883
2025-12-04T09:17:19.1087959Z  * [new tag]                 trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c
2025-12-04T09:17:19.1089459Z  * [new tag]                 trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b
2025-12-04T09:17:19.1090556Z  * [new tag]                 trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c
2025-12-04T09:17:19.1092249Z  * [new tag]                 trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202
2025-12-04T09:17:19.1093824Z  * [new tag]                 trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447
2025-12-04T09:17:19.1095379Z  * [new tag]                 trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef
2025-12-04T09:17:19.1096677Z  * [new tag]                 trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719
2025-12-04T09:17:19.1098322Z  * [new tag]                 trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b
2025-12-04T09:17:19.1099681Z  * [new tag]                 trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a
2025-12-04T09:17:19.1101292Z  * [new tag]                 trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206
2025-12-04T09:17:19.1102810Z  * [new tag]                 trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386
2025-12-04T09:17:19.1104282Z  * [new tag]                 trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4
2025-12-04T09:17:19.1105903Z  * [new tag]                 trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d
2025-12-04T09:17:19.1106936Z  * [new tag]                 trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b
2025-12-04T09:17:19.1108657Z  * [new tag]                 trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5
2025-12-04T09:17:19.1110272Z  * [new tag]                 trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8
2025-12-04T09:17:19.1111779Z  * [new tag]                 trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec
2025-12-04T09:17:19.1113256Z  * [new tag]                 trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71
2025-12-04T09:17:19.1114756Z  * [new tag]                 trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d
2025-12-04T09:17:19.1116279Z  * [new tag]                 trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a
2025-12-04T09:17:19.1117897Z  * [new tag]                 trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e
2025-12-04T09:17:19.1119356Z  * [new tag]                 trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8
2025-12-04T09:17:19.1121270Z  * [new tag]                 trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485
2025-12-04T09:17:19.1122868Z  * [new tag]                 trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734
2025-12-04T09:17:19.1125831Z  * [new tag]                 trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f
2025-12-04T09:17:19.1126248Z  * [new tag]                 trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696
2025-12-04T09:17:19.1127751Z  * [new tag]                 trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f -> trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f
2025-12-04T09:17:19.1128726Z  * [new tag]                 trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03
2025-12-04T09:17:19.1130540Z  * [new tag]                 trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6
2025-12-04T09:17:19.1132015Z  * [new tag]                 trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7
2025-12-04T09:17:19.1133082Z  * [new tag]                 trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3
2025-12-04T09:17:19.1134800Z  * [new tag]                 trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca
2025-12-04T09:17:19.1136292Z  * [new tag]                 trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b
2025-12-04T09:17:19.1137757Z  * [new tag]                 trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609
2025-12-04T09:17:19.1139349Z  * [new tag]                 trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b
2025-12-04T09:17:19.1140459Z  * [new tag]                 trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T09:17:19.1142094Z  * [new tag]                 trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8
2025-12-04T09:17:19.1143663Z  * [new tag]                 trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed
2025-12-04T09:17:19.1145252Z  * [new tag]                 trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8
2025-12-04T09:17:19.1146354Z  * [new tag]                 trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e
2025-12-04T09:17:19.1147988Z  * [new tag]                 trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead
2025-12-04T09:17:19.1149193Z  * [new tag]                 trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718
2025-12-04T09:17:19.1151160Z  * [new tag]                 trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0
2025-12-04T09:17:19.1152960Z  * [new tag]                 trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75
2025-12-04T09:17:19.1154249Z  * [new tag]                 trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece
2025-12-04T09:17:19.1155750Z  * [new tag]                 trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6
2025-12-04T09:17:19.1157423Z  * [new tag]                 trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4
2025-12-04T09:17:19.1158888Z  * [new tag]                 trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c
2025-12-04T09:17:19.1160350Z  * [new tag]                 trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43
2025-12-04T09:17:19.1161888Z  * [new tag]                 trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922
2025-12-04T09:17:19.1163442Z  * [new tag]                 trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca
2025-12-04T09:17:19.1164973Z  * [new tag]                 trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38
2025-12-04T09:17:19.1166448Z  * [new tag]                 trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90
2025-12-04T09:17:19.1167577Z  * [new tag]                 trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c
2025-12-04T09:17:19.1169398Z  * [new tag]                 trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87
2025-12-04T09:17:19.1170923Z  * [new tag]                 trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693
2025-12-04T09:17:19.1172451Z  * [new tag]                 trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa
2025-12-04T09:17:19.1173923Z  * [new tag]                 trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d
2025-12-04T09:17:19.1175434Z  * [new tag]                 trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639
2025-12-04T09:17:19.1176930Z  * [new tag]                 trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8
2025-12-04T09:17:19.1178434Z  * [new tag]                 trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d
2025-12-04T09:17:19.1180019Z  * [new tag]                 trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a
2025-12-04T09:17:19.1181581Z  * [new tag]                 trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742
2025-12-04T09:17:19.1183193Z  * [new tag]                 trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098
2025-12-04T09:17:19.1184642Z  * [new tag]                 trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa
2025-12-04T09:17:19.1186367Z  * [new tag]                 trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d
2025-12-04T09:17:19.1187682Z  * [new tag]                 trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c
2025-12-04T09:17:19.1189327Z  * [new tag]                 trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90
2025-12-04T09:17:19.1190825Z  * [new tag]                 trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c
2025-12-04T09:17:19.1191926Z  * [new tag]                 trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45
2025-12-04T09:17:19.1193732Z  * [new tag]                 trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109
2025-12-04T09:17:19.1195266Z  * [new tag]                 trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e
2025-12-04T09:17:19.1196721Z  * [new tag]                 trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e
2025-12-04T09:17:19.1197881Z  * [new tag]                 trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e
2025-12-04T09:17:19.1199501Z  * [new tag]                 trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48
2025-12-04T09:17:19.1201042Z  * [new tag]                 trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62
2025-12-04T09:17:19.1202601Z  * [new tag]                 trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2
2025-12-04T09:17:19.1204213Z  * [new tag]                 trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99
2025-12-04T09:17:19.1205794Z  * [new tag]                 trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24
2025-12-04T09:17:19.1207344Z  * [new tag]                 trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7
2025-12-04T09:17:19.1209038Z  * [new tag]                 trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a
2025-12-04T09:17:19.1210681Z  * [new tag]                 trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417
2025-12-04T09:17:19.1212167Z  * [new tag]                 trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4
2025-12-04T09:17:19.1213728Z  * [new tag]                 trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b
2025-12-04T09:17:19.1215418Z  * [new tag]                 trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e
2025-12-04T09:17:19.1216983Z  * [new tag]                 trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf
2025-12-04T09:17:19.1218872Z  * [new tag]                 trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5
2025-12-04T09:17:19.1220699Z  * [new tag]                 trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f
2025-12-04T09:17:19.1222376Z  * [new tag]                 trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f
2025-12-04T09:17:19.1223853Z  * [new tag]                 trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2
2025-12-04T09:17:19.1225357Z  * [new tag]                 trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55
2025-12-04T09:17:19.1227139Z  * [new tag]                 trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8
2025-12-04T09:17:19.1228243Z  * [new tag]                 trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09
2025-12-04T09:17:19.1230075Z  * [new tag]                 trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5
2025-12-04T09:17:19.1231845Z  * [new tag]                 trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564
2025-12-04T09:17:19.1233443Z  * [new tag]                 trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede
2025-12-04T09:17:19.1235081Z  * [new tag]                 trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac
2025-12-04T09:17:19.1236550Z  * [new tag]                 trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb
2025-12-04T09:17:19.1237685Z  * [new tag]                 trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1
2025-12-04T09:17:19.1239597Z  * [new tag]                 trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0
2025-12-04T09:17:19.1241273Z  * [new tag]                 trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09
2025-12-04T09:17:19.1242400Z  * [new tag]                 trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9
2025-12-04T09:17:19.1244180Z  * [new tag]                 trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a
2025-12-04T09:17:19.1245800Z  * [new tag]                 trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace
2025-12-04T09:17:19.1247231Z  * [new tag]                 trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39
2025-12-04T09:17:19.1248726Z  * [new tag]                 trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50
2025-12-04T09:17:19.1250258Z  * [new tag]                 trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01
2025-12-04T09:17:19.1251828Z  * [new tag]                 trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf
2025-12-04T09:17:19.1253358Z  * [new tag]                 trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8
2025-12-04T09:17:19.1254808Z  * [new tag]                 trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d
2025-12-04T09:17:19.1256383Z  * [new tag]                 trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47
2025-12-04T09:17:19.1257926Z  * [new tag]                 trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1
2025-12-04T09:17:19.1259472Z  * [new tag]                 trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e
2025-12-04T09:17:19.1261105Z  * [new tag]                 trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a
2025-12-04T09:17:19.1262594Z  * [new tag]                 trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b
2025-12-04T09:17:19.1264204Z  * [new tag]                 trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec
2025-12-04T09:17:19.1265841Z  * [new tag]                 trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf
2025-12-04T09:17:19.1267391Z  * [new tag]                 trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd
2025-12-04T09:17:19.1268854Z  * [new tag]                 trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889
2025-12-04T09:17:19.1270411Z  * [new tag]                 trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77
2025-12-04T09:17:19.1272207Z  * [new tag]                 trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c
2025-12-04T09:17:19.1273792Z  * [new tag]                 trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b
2025-12-04T09:17:19.1275093Z  * [new tag]                 trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235
2025-12-04T09:17:19.1276796Z  * [new tag]                 trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e
2025-12-04T09:17:19.1278450Z  * [new tag]                 trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc
2025-12-04T09:17:19.1280000Z  * [new tag]                 trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3
2025-12-04T09:17:19.1281563Z  * [new tag]                 trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf
2025-12-04T09:17:19.1283083Z  * [new tag]                 trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e
2025-12-04T09:17:19.1284598Z  * [new tag]                 trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e
2025-12-04T09:17:19.1286508Z  * [new tag]                 trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2
2025-12-04T09:17:19.1288034Z  * [new tag]                 trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4
2025-12-04T09:17:19.1289602Z  * [new tag]                 trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53
2025-12-04T09:17:19.1291054Z  * [new tag]                 trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598
2025-12-04T09:17:19.1292568Z  * [new tag]                 trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40
2025-12-04T09:17:19.1294110Z  * [new tag]                 trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56
2025-12-04T09:17:19.1295626Z  * [new tag]                 trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8
2025-12-04T09:17:19.1297111Z  * [new tag]                 trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467
2025-12-04T09:17:19.1298636Z  * [new tag]                 trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17
2025-12-04T09:17:19.1300423Z  * [new tag]                 trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73
2025-12-04T09:17:19.1301527Z  * [new tag]                 trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7
2025-12-04T09:17:19.1303324Z  * [new tag]                 trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b
2025-12-04T09:17:19.1304900Z  * [new tag]                 trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7
2025-12-04T09:17:19.1306844Z  * [new tag]                 trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307
2025-12-04T09:17:19.1308715Z  * [new tag]                 trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009
2025-12-04T09:17:19.1310323Z  * [new tag]                 trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:17:19.1311049Z  * [new tag]                 v0.1.1                      -> v0.1.1
2025-12-04T09:17:19.1312682Z  * [new tag]                 v0.1.10                     -> v0.1.10
2025-12-04T09:17:19.1314057Z  * [new tag]                 v0.1.11                     -> v0.1.11
2025-12-04T09:17:19.1315638Z  * [new tag]                 v0.1.12                     -> v0.1.12
2025-12-04T09:17:19.1316972Z  * [new tag]                 v0.1.2                      -> v0.1.2
2025-12-04T09:17:19.1318349Z  * [new tag]                 v0.1.3                      -> v0.1.3
2025-12-04T09:17:19.1319731Z  * [new tag]                 v0.1.4                      -> v0.1.4
2025-12-04T09:17:19.1321152Z  * [new tag]                 v0.1.5                      -> v0.1.5
2025-12-04T09:17:19.1322556Z  * [new tag]                 v0.1.6                      -> v0.1.6
2025-12-04T09:17:19.1323886Z  * [new tag]                 v0.1.7                      -> v0.1.7
2025-12-04T09:17:19.1325323Z  * [new tag]                 v0.1.8                      -> v0.1.8
2025-12-04T09:17:19.1326681Z  * [new tag]                 v0.1.9                      -> v0.1.9
2025-12-04T09:17:19.1328170Z  * [new tag]                 v0.2.0                      -> v0.2.0
2025-12-04T09:17:19.1329599Z  * [new tag]                 v0.3.0                      -> v0.3.0
2025-12-04T09:17:19.1331088Z  * [new tag]                 v0.3.1                      -> v0.3.1
2025-12-04T09:17:19.1332676Z  * [new tag]                 v0.4.0                      -> v0.4.0
2025-12-04T09:17:19.1334149Z  * [new tag]                 v0.4.1                      -> v0.4.1
2025-12-04T09:17:19.1335566Z  * [new tag]                 v1.0.0                      -> v1.0.0
2025-12-04T09:17:19.1336957Z  * [new tag]                 v1.0.0a0                    -> v1.0.0a0
2025-12-04T09:17:19.1338373Z  * [new tag]                 v1.0.1                      -> v1.0.1
2025-12-04T09:17:19.1339950Z  * [new tag]                 v1.0rc0                     -> v1.0rc0
2025-12-04T09:17:19.1341143Z  * [new tag]                 v1.0rc1                     -> v1.0rc1
2025-12-04T09:17:19.1342570Z  * [new tag]                 v1.1.0                      -> v1.1.0
2025-12-04T09:17:19.1344446Z  * [new tag]                 v1.1.0a0                    -> v1.1.0a0
2025-12-04T09:17:19.1346087Z  * [new tag]                 v1.10.0                     -> v1.10.0
2025-12-04T09:17:19.1347616Z  * [new tag]                 v1.10.0-rc1                 -> v1.10.0-rc1
2025-12-04T09:17:19.1349036Z  * [new tag]                 v1.10.0-rc2                 -> v1.10.0-rc2
2025-12-04T09:17:19.1350248Z  * [new tag]                 v1.10.0-rc3                 -> v1.10.0-rc3
2025-12-04T09:17:19.1351743Z  * [new tag]                 v1.10.1                     -> v1.10.1
2025-12-04T09:17:19.1352950Z  * [new tag]                 v1.10.1-rc1                 -> v1.10.1-rc1
2025-12-04T09:17:19.1354168Z  * [new tag]                 v1.10.2                     -> v1.10.2
2025-12-04T09:17:19.1355381Z  * [new tag]                 v1.10.2-rc1                 -> v1.10.2-rc1
2025-12-04T09:17:19.1356854Z  * [new tag]                 v1.11.0                     -> v1.11.0
2025-12-04T09:17:19.1358271Z  * [new tag]                 v1.11.0-rc1                 -> v1.11.0-rc1
2025-12-04T09:17:19.1359882Z  * [new tag]                 v1.11.0-rc2                 -> v1.11.0-rc2
2025-12-04T09:17:19.1361463Z  * [new tag]                 v1.11.0-rc3                 -> v1.11.0-rc3
2025-12-04T09:17:19.1362954Z  * [new tag]                 v1.11.0-rc4                 -> v1.11.0-rc4
2025-12-04T09:17:19.1364404Z  * [new tag]                 v1.11.0-rc5                 -> v1.11.0-rc5
2025-12-04T09:17:19.1365648Z  * [new tag]                 v1.11.0-rc6                 -> v1.11.0-rc6
2025-12-04T09:17:19.1366860Z  * [new tag]                 v1.11.0-rc7                 -> v1.11.0-rc7
2025-12-04T09:17:19.1368481Z  * [new tag]                 v1.12.0                     -> v1.12.0
2025-12-04T09:17:19.1369840Z  * [new tag]                 v1.12.0-rc1                 -> v1.12.0-rc1
2025-12-04T09:17:19.1371353Z  * [new tag]                 v1.12.0-rc2                 -> v1.12.0-rc2
2025-12-04T09:17:19.1372899Z  * [new tag]                 v1.12.0-rc3                 -> v1.12.0-rc3
2025-12-04T09:17:19.1374374Z  * [new tag]                 v1.12.0-rc4                 -> v1.12.0-rc4
2025-12-04T09:17:19.1375808Z  * [new tag]                 v1.12.0-rc5                 -> v1.12.0-rc5
2025-12-04T09:17:19.1377388Z  * [new tag]                 v1.12.0-rc6                 -> v1.12.0-rc6
2025-12-04T09:17:19.1378615Z  * [new tag]                 v1.12.0-rc7                 -> v1.12.0-rc7
2025-12-04T09:17:19.1379975Z  * [new tag]                 v1.12.0-rc8                 -> v1.12.0-rc8
2025-12-04T09:17:19.1381168Z  * [new tag]                 v1.12.1                     -> v1.12.1
2025-12-04T09:17:19.1382732Z  * [new tag]                 v1.12.1-rc1                 -> v1.12.1-rc1
2025-12-04T09:17:19.1384447Z  * [new tag]                 v1.12.1-rc2                 -> v1.12.1-rc2
2025-12-04T09:17:19.1385969Z  * [new tag]                 v1.12.1-rc3                 -> v1.12.1-rc3
2025-12-04T09:17:19.1387426Z  * [new tag]                 v1.12.1-rc4                 -> v1.12.1-rc4
2025-12-04T09:17:19.1388645Z  * [new tag]                 v1.12.1-rc5                 -> v1.12.1-rc5
2025-12-04T09:17:19.1390129Z  * [new tag]                 v1.13.0                     -> v1.13.0
2025-12-04T09:17:19.1391543Z  * [new tag]                 v1.13.0-rc1                 -> v1.13.0-rc1
2025-12-04T09:17:19.1392914Z  * [new tag]                 v1.13.0-rc2                 -> v1.13.0-rc2
2025-12-04T09:17:19.1394273Z  * [new tag]                 v1.13.0-rc3                 -> v1.13.0-rc3
2025-12-04T09:17:19.1395878Z  * [new tag]                 v1.13.0-rc4                 -> v1.13.0-rc4
2025-12-04T09:17:19.1397131Z  * [new tag]                 v1.13.0-rc5                 -> v1.13.0-rc5
2025-12-04T09:17:19.1398347Z  * [new tag]                 v1.13.0-rc6                 -> v1.13.0-rc6
2025-12-04T09:17:19.1399835Z  * [new tag]                 v1.13.1                     -> v1.13.1
2025-12-04T09:17:19.1401071Z  * [new tag]                 v1.13.1-rc1                 -> v1.13.1-rc1
2025-12-04T09:17:19.1402513Z  * [new tag]                 v1.2.0                      -> v1.2.0
2025-12-04T09:17:19.1403945Z  * [new tag]                 v1.2.0a0                    -> v1.2.0a0
2025-12-04T09:17:19.1405344Z  * [new tag]                 v1.3.0                      -> v1.3.0
2025-12-04T09:17:19.1406873Z  * [new tag]                 v1.3.0a0                    -> v1.3.0a0
2025-12-04T09:17:19.1408249Z  * [new tag]                 v1.3.1                      -> v1.3.1
2025-12-04T09:17:19.1413176Z  * [new tag]                 v1.4.0                      -> v1.4.0
2025-12-04T09:17:19.1414608Z  * [new tag]                 v1.4.0a0                    -> v1.4.0a0
2025-12-04T09:17:19.1415827Z  * [new tag]                 v1.4.1                      -> v1.4.1
2025-12-04T09:17:19.1417335Z  * [new tag]                 v1.5.0                      -> v1.5.0
2025-12-04T09:17:19.1418816Z  * [new tag]                 v1.5.0-rc1                  -> v1.5.0-rc1
2025-12-04T09:17:19.1420462Z  * [new tag]                 v1.5.0-rc2                  -> v1.5.0-rc2
2025-12-04T09:17:19.1421972Z  * [new tag]                 v1.5.0-rc3                  -> v1.5.0-rc3
2025-12-04T09:17:19.1423311Z  * [new tag]                 v1.5.0-rc4                  -> v1.5.0-rc4
2025-12-04T09:17:19.1424512Z  * [new tag]                 v1.5.0-rc5                  -> v1.5.0-rc5
2025-12-04T09:17:19.1426039Z  * [new tag]                 v1.5.1                      -> v1.5.1
2025-12-04T09:17:19.1427291Z  * [new tag]                 v1.5.1-rc1                  -> v1.5.1-rc1
2025-12-04T09:17:19.1428464Z  * [new tag]                 v1.6.0                      -> v1.6.0
2025-12-04T09:17:19.1429958Z  * [new tag]                 v1.6.0-rc1                  -> v1.6.0-rc1
2025-12-04T09:17:19.1431591Z  * [new tag]                 v1.6.0-rc2                  -> v1.6.0-rc2
2025-12-04T09:17:19.1432947Z  * [new tag]                 v1.6.0-rc3                  -> v1.6.0-rc3
2025-12-04T09:17:19.1434337Z  * [new tag]                 v1.6.0-rc4                  -> v1.6.0-rc4
2025-12-04T09:17:19.1435920Z  * [new tag]                 v1.6.0-rc5                  -> v1.6.0-rc5
2025-12-04T09:17:19.1437302Z  * [new tag]                 v1.6.0-rc6                  -> v1.6.0-rc6
2025-12-04T09:17:19.1439039Z  * [new tag]                 v1.6.0-rc7                  -> v1.6.0-rc7
2025-12-04T09:17:19.1440555Z  * [new tag]                 v1.7.0                      -> v1.7.0
2025-12-04T09:17:19.1442031Z  * [new tag]                 v1.7.0-rc1                  -> v1.7.0-rc1
2025-12-04T09:17:19.1443554Z  * [new tag]                 v1.7.0-rc2                  -> v1.7.0-rc2
2025-12-04T09:17:19.1445029Z  * [new tag]                 v1.7.0-rc3                  -> v1.7.0-rc3
2025-12-04T09:17:19.1446257Z  * [new tag]                 v1.7.0-rc4                  -> v1.7.0-rc4
2025-12-04T09:17:19.1447724Z  * [new tag]                 v1.7.1                      -> v1.7.1
2025-12-04T09:17:19.1449292Z  * [new tag]                 v1.7.1-rc1                  -> v1.7.1-rc1
2025-12-04T09:17:19.1450748Z  * [new tag]                 v1.7.1-rc2                  -> v1.7.1-rc2
2025-12-04T09:17:19.1451987Z  * [new tag]                 v1.7.1-rc3                  -> v1.7.1-rc3
2025-12-04T09:17:19.1453482Z  * [new tag]                 v1.8.0                      -> v1.8.0
2025-12-04T09:17:19.1454720Z  * [new tag]                 v1.8.0-rc1                  -> v1.8.0-rc1
2025-12-04T09:17:19.1456283Z  * [new tag]                 v1.8.0-rc2                  -> v1.8.0-rc2
2025-12-04T09:17:19.1457727Z  * [new tag]                 v1.8.0-rc3                  -> v1.8.0-rc3
2025-12-04T09:17:19.1459176Z  * [new tag]                 v1.8.0-rc4                  -> v1.8.0-rc4
2025-12-04T09:17:19.1460434Z  * [new tag]                 v1.8.0-rc5                  -> v1.8.0-rc5
2025-12-04T09:17:19.1461671Z  * [new tag]                 v1.8.1                      -> v1.8.1
2025-12-04T09:17:19.1463168Z  * [new tag]                 v1.8.1-rc1                  -> v1.8.1-rc1
2025-12-04T09:17:19.1464391Z  * [new tag]                 v1.8.1-rc2                  -> v1.8.1-rc2
2025-12-04T09:17:19.1465653Z  * [new tag]                 v1.8.1-rc3                  -> v1.8.1-rc3
2025-12-04T09:17:19.1467550Z  * [new tag]                 v1.8.2                      -> v1.8.2
2025-12-04T09:17:19.1468784Z  * [new tag]                 v1.8.2-rc1                  -> v1.8.2-rc1
2025-12-04T09:17:19.1470277Z  * [new tag]                 v1.9.0                      -> v1.9.0
2025-12-04T09:17:19.1471768Z  * [new tag]                 v1.9.0-rc1                  -> v1.9.0-rc1
2025-12-04T09:17:19.1473259Z  * [new tag]                 v1.9.0-rc2                  -> v1.9.0-rc2
2025-12-04T09:17:19.1474770Z  * [new tag]                 v1.9.0-rc3                  -> v1.9.0-rc3
2025-12-04T09:17:19.1476026Z  * [new tag]                 v1.9.0-rc4                  -> v1.9.0-rc4
2025-12-04T09:17:19.1477479Z  * [new tag]                 v1.9.1                      -> v1.9.1
2025-12-04T09:17:19.1479101Z  * [new tag]                 v1.9.1-rc1                  -> v1.9.1-rc1
2025-12-04T09:17:19.1480325Z  * [new tag]                 v1.9.1-rc2                  -> v1.9.1-rc2
2025-12-04T09:17:19.1481885Z  * [new tag]                 v2.0.0                      -> v2.0.0
2025-12-04T09:17:19.1483239Z  * [new tag]                 v2.0.0-rc1                  -> v2.0.0-rc1
2025-12-04T09:17:19.1484751Z  * [new tag]                 v2.0.0-rc2                  -> v2.0.0-rc2
2025-12-04T09:17:19.1486223Z  * [new tag]                 v2.0.0-rc3                  -> v2.0.0-rc3
2025-12-04T09:17:19.1487692Z  * [new tag]                 v2.0.0-rc4                  -> v2.0.0-rc4
2025-12-04T09:17:19.1489184Z  * [new tag]                 v2.0.0-rc5                  -> v2.0.0-rc5
2025-12-04T09:17:19.1490530Z  * [new tag]                 v2.0.0-rc6                  -> v2.0.0-rc6
2025-12-04T09:17:19.1491963Z  * [new tag]                 v2.0.1                      -> v2.0.1
2025-12-04T09:17:19.1493420Z  * [new tag]                 v2.0.1-rc1                  -> v2.0.1-rc1
2025-12-04T09:17:19.1494410Z  * [new tag]                 v2.0.1-rc2                  -> v2.0.1-rc2
2025-12-04T09:17:19.1496130Z  * [new tag]                 v2.0.1-rc3                  -> v2.0.1-rc3
2025-12-04T09:17:19.1497387Z  * [new tag]                 v2.0.1-rc4                  -> v2.0.1-rc4
2025-12-04T09:17:19.1499361Z  * [new tag]                 v2.1.0                      -> v2.1.0
2025-12-04T09:17:19.1500849Z  * [new tag]                 v2.1.0-rc1                  -> v2.1.0-rc1
2025-12-04T09:17:19.1502448Z  * [new tag]                 v2.1.0-rc2                  -> v2.1.0-rc2
2025-12-04T09:17:19.1504043Z  * [new tag]                 v2.1.0-rc3                  -> v2.1.0-rc3
2025-12-04T09:17:19.1505510Z  * [new tag]                 v2.1.0-rc4                  -> v2.1.0-rc4
2025-12-04T09:17:19.1506963Z  * [new tag]                 v2.1.0-rc5                  -> v2.1.0-rc5
2025-12-04T09:17:19.1508181Z  * [new tag]                 v2.1.0-rc6                  -> v2.1.0-rc6
2025-12-04T09:17:19.1510052Z  * [new tag]                 v2.1.1                      -> v2.1.1
2025-12-04T09:17:19.1511547Z  * [new tag]                 v2.1.1-rc1                  -> v2.1.1-rc1
2025-12-04T09:17:19.1513073Z  * [new tag]                 v2.1.1-rc2                  -> v2.1.1-rc2
2025-12-04T09:17:19.1514643Z  * [new tag]                 v2.1.1-rc3                  -> v2.1.1-rc3
2025-12-04T09:17:19.1516223Z  * [new tag]                 v2.1.1-rc4                  -> v2.1.1-rc4
2025-12-04T09:17:19.1517600Z  * [new tag]                 v2.1.1-rc5                  -> v2.1.1-rc5
2025-12-04T09:17:19.1518816Z  * [new tag]                 v2.1.1-rc6                  -> v2.1.1-rc6
2025-12-04T09:17:19.1520250Z  * [new tag]                 v2.1.2                      -> v2.1.2
2025-12-04T09:17:19.1521877Z  * [new tag]                 v2.1.2-rc1                  -> v2.1.2-rc1
2025-12-04T09:17:19.1523275Z  * [new tag]                 v2.1.2-rc2                  -> v2.1.2-rc2
2025-12-04T09:17:19.1524527Z  * [new tag]                 v2.1.2-rc3                  -> v2.1.2-rc3
2025-12-04T09:17:19.1526088Z  * [new tag]                 v2.2.0                      -> v2.2.0
2025-12-04T09:17:19.1527562Z  * [new tag]                 v2.2.0-rc1                  -> v2.2.0-rc1
2025-12-04T09:17:19.1528941Z  * [new tag]                 v2.2.0-rc2                  -> v2.2.0-rc2
2025-12-04T09:17:19.1530320Z  * [new tag]                 v2.2.0-rc3                  -> v2.2.0-rc3
2025-12-04T09:17:19.1532292Z  * [new tag]                 v2.2.0-rc4                  -> v2.2.0-rc4
2025-12-04T09:17:19.1533759Z  * [new tag]                 v2.2.0-rc5                  -> v2.2.0-rc5
2025-12-04T09:17:19.1535193Z  * [new tag]                 v2.2.0-rc6                  -> v2.2.0-rc6
2025-12-04T09:17:19.1536450Z  * [new tag]                 v2.2.0-rc7                  -> v2.2.0-rc7
2025-12-04T09:17:19.1537673Z  * [new tag]                 v2.2.0-rc8                  -> v2.2.0-rc8
2025-12-04T09:17:19.1539341Z  * [new tag]                 v2.2.1                      -> v2.2.1
2025-12-04T09:17:19.1540834Z  * [new tag]                 v2.2.1-rc1                  -> v2.2.1-rc1
2025-12-04T09:17:19.1542098Z  * [new tag]                 v2.2.1-rc2                  -> v2.2.1-rc2
2025-12-04T09:17:19.1543317Z  * [new tag]                 v2.2.1-rc3                  -> v2.2.1-rc3
2025-12-04T09:17:19.1544585Z  * [new tag]                 v2.2.2                      -> v2.2.2
2025-12-04T09:17:19.1546163Z  * [new tag]                 v2.2.2-rc1                  -> v2.2.2-rc1
2025-12-04T09:17:19.1547446Z  * [new tag]                 v2.2.2-rc2                  -> v2.2.2-rc2
2025-12-04T09:17:19.1548743Z  * [new tag]                 v2.2.2-rc3                  -> v2.2.2-rc3
2025-12-04T09:17:19.1550468Z  * [new tag]                 v2.3.0                      -> v2.3.0
2025-12-04T09:17:19.1551720Z  * [new tag]                 v2.3.0-rc1                  -> v2.3.0-rc1
2025-12-04T09:17:19.1553239Z  * [new tag]                 v2.3.0-rc10                 -> v2.3.0-rc10
2025-12-04T09:17:19.1554831Z  * [new tag]                 v2.3.0-rc11                 -> v2.3.0-rc11
2025-12-04T09:17:19.1555865Z  * [new tag]                 v2.3.0-rc12                 -> v2.3.0-rc12
2025-12-04T09:17:19.1557588Z  * [new tag]                 v2.3.0-rc2                  -> v2.3.0-rc2
2025-12-04T09:17:19.1559095Z  * [new tag]                 v2.3.0-rc3                  -> v2.3.0-rc3
2025-12-04T09:17:19.1560520Z  * [new tag]                 v2.3.0-rc4                  -> v2.3.0-rc4
2025-12-04T09:17:19.1561996Z  * [new tag]                 v2.3.0-rc5                  -> v2.3.0-rc5
2025-12-04T09:17:19.1563241Z  * [new tag]                 v2.3.0-rc6                  -> v2.3.0-rc6
2025-12-04T09:17:19.1564772Z  * [new tag]                 v2.3.0-rc7                  -> v2.3.0-rc7
2025-12-04T09:17:19.1566234Z  * [new tag]                 v2.3.0-rc8                  -> v2.3.0-rc8
2025-12-04T09:17:19.1567501Z  * [new tag]                 v2.3.0-rc9                  -> v2.3.0-rc9
2025-12-04T09:17:19.1568709Z  * [new tag]                 v2.3.1                      -> v2.3.1
2025-12-04T09:17:19.1570232Z  * [new tag]                 v2.3.1-rc1                  -> v2.3.1-rc1
2025-12-04T09:17:19.1571730Z  * [new tag]                 v2.3.1-rc2                  -> v2.3.1-rc2
2025-12-04T09:17:19.1573278Z  * [new tag]                 v2.3.1-rc3                  -> v2.3.1-rc3
2025-12-04T09:17:19.1574696Z  * [new tag]                 v2.4.0                      -> v2.4.0
2025-12-04T09:17:19.1576212Z  * [new tag]                 v2.4.0-rc1                  -> v2.4.0-rc1
2025-12-04T09:17:19.1577631Z  * [new tag]                 v2.4.0-rc2                  -> v2.4.0-rc2
2025-12-04T09:17:19.1579253Z  * [new tag]                 v2.4.0-rc3                  -> v2.4.0-rc3
2025-12-04T09:17:19.1580643Z  * [new tag]                 v2.4.0-rc4                  -> v2.4.0-rc4
2025-12-04T09:17:19.1582137Z  * [new tag]                 v2.4.0-rc5                  -> v2.4.0-rc5
2025-12-04T09:17:19.1583837Z  * [new tag]                 v2.4.0-rc6                  -> v2.4.0-rc6
2025-12-04T09:17:19.1585446Z  * [new tag]                 v2.4.0-rc7                  -> v2.4.0-rc7
2025-12-04T09:17:19.1586910Z  * [new tag]                 v2.4.0-rc8                  -> v2.4.0-rc8
2025-12-04T09:17:19.1588410Z  * [new tag]                 v2.4.0-rc9                  -> v2.4.0-rc9
2025-12-04T09:17:19.1589667Z  * [new tag]                 v2.4.1                      -> v2.4.1
2025-12-04T09:17:19.1591237Z  * [new tag]                 v2.4.1-rc1                  -> v2.4.1-rc1
2025-12-04T09:17:19.1592700Z  * [new tag]                 v2.4.1-rc2                  -> v2.4.1-rc2
2025-12-04T09:17:19.1594186Z  * [new tag]                 v2.4.1-rc3                  -> v2.4.1-rc3
2025-12-04T09:17:19.1595778Z  * [new tag]                 v2.5.0                      -> v2.5.0
2025-12-04T09:17:19.1597241Z  * [new tag]                 v2.5.0-rc1                  -> v2.5.0-rc1
2025-12-04T09:17:19.1598441Z  * [new tag]                 v2.5.0-rc10                 -> v2.5.0-rc10
2025-12-04T09:17:19.1599847Z  * [new tag]                 v2.5.0-rc2                  -> v2.5.0-rc2
2025-12-04T09:17:19.1601316Z  * [new tag]                 v2.5.0-rc3                  -> v2.5.0-rc3
2025-12-04T09:17:19.1602839Z  * [new tag]                 v2.5.0-rc4                  -> v2.5.0-rc4
2025-12-04T09:17:19.1604254Z  * [new tag]                 v2.5.0-rc5                  -> v2.5.0-rc5
2025-12-04T09:17:19.1605785Z  * [new tag]                 v2.5.0-rc6                  -> v2.5.0-rc6
2025-12-04T09:17:19.1607238Z  * [new tag]                 v2.5.0-rc7                  -> v2.5.0-rc7
2025-12-04T09:17:19.1608849Z  * [new tag]                 v2.5.0-rc8                  -> v2.5.0-rc8
2025-12-04T09:17:19.1610607Z  * [new tag]                 v2.5.0-rc9                  -> v2.5.0-rc9
2025-12-04T09:17:19.1611465Z  * [new tag]                 v2.5.1                      -> v2.5.1
2025-12-04T09:17:19.1612945Z  * [new tag]                 v2.5.1-rc1                  -> v2.5.1-rc1
2025-12-04T09:17:19.1614205Z  * [new tag]                 v2.6.0                      -> v2.6.0
2025-12-04T09:17:19.1615720Z  * [new tag]                 v2.6.0-rc1                  -> v2.6.0-rc1
2025-12-04T09:17:19.1617207Z  * [new tag]                 v2.6.0-rc2                  -> v2.6.0-rc2
2025-12-04T09:17:19.1618673Z  * [new tag]                 v2.6.0-rc3                  -> v2.6.0-rc3
2025-12-04T09:17:19.1620333Z  * [new tag]                 v2.6.0-rc4                  -> v2.6.0-rc4
2025-12-04T09:17:19.1621975Z  * [new tag]                 v2.6.0-rc5                  -> v2.6.0-rc5
2025-12-04T09:17:19.1623538Z  * [new tag]                 v2.6.0-rc6                  -> v2.6.0-rc6
2025-12-04T09:17:19.1625513Z  * [new tag]                 v2.6.0-rc7                  -> v2.6.0-rc7
2025-12-04T09:17:19.1627132Z  * [new tag]                 v2.6.0-rc8                  -> v2.6.0-rc8
2025-12-04T09:17:19.1628584Z  * [new tag]                 v2.6.0-rc9                  -> v2.6.0-rc9
2025-12-04T09:17:19.1630235Z  * [new tag]                 v2.7.0                      -> v2.7.0
2025-12-04T09:17:19.1631684Z  * [new tag]                 v2.7.0-rc1                  -> v2.7.0-rc1
2025-12-04T09:17:19.1632971Z  * [new tag]                 v2.7.0-rc10                 -> v2.7.0-rc10
2025-12-04T09:17:19.1634536Z  * [new tag]                 v2.7.0-rc2                  -> v2.7.0-rc2
2025-12-04T09:17:19.1636013Z  * [new tag]                 v2.7.0-rc3                  -> v2.7.0-rc3
2025-12-04T09:17:19.1637474Z  * [new tag]                 v2.7.0-rc4                  -> v2.7.0-rc4
2025-12-04T09:17:19.1638988Z  * [new tag]                 v2.7.0-rc5                  -> v2.7.0-rc5
2025-12-04T09:17:19.1640390Z  * [new tag]                 v2.7.0-rc6                  -> v2.7.0-rc6
2025-12-04T09:17:19.1641863Z  * [new tag]                 v2.7.0-rc7                  -> v2.7.0-rc7
2025-12-04T09:17:19.1643508Z  * [new tag]                 v2.7.0-rc8                  -> v2.7.0-rc8
2025-12-04T09:17:19.1645084Z  * [new tag]                 v2.7.0-rc9                  -> v2.7.0-rc9
2025-12-04T09:17:19.1646349Z  * [new tag]                 v2.7.1                      -> v2.7.1
2025-12-04T09:17:19.1647921Z  * [new tag]                 v2.7.1-rc1                  -> v2.7.1-rc1
2025-12-04T09:17:19.1649444Z  * [new tag]                 v2.7.1-rc2                  -> v2.7.1-rc2
2025-12-04T09:17:19.1651115Z  * [new tag]                 v2.7.1-rc3                  -> v2.7.1-rc3
2025-12-04T09:17:19.1652654Z  * [new tag]                 v2.7.1-rc4                  -> v2.7.1-rc4
2025-12-04T09:17:19.1654144Z  * [new tag]                 v2.7.1-rc5                  -> v2.7.1-rc5
2025-12-04T09:17:19.1655419Z  * [new tag]                 v2.8.0                      -> v2.8.0
2025-12-04T09:17:19.1657004Z  * [new tag]                 v2.8.0-rc1                  -> v2.8.0-rc1
2025-12-04T09:17:19.1658420Z  * [new tag]                 v2.8.0-rc2                  -> v2.8.0-rc2
2025-12-04T09:17:19.1660106Z  * [new tag]                 v2.8.0-rc3                  -> v2.8.0-rc3
2025-12-04T09:17:19.1661715Z  * [new tag]                 v2.8.0-rc4                  -> v2.8.0-rc4
2025-12-04T09:17:19.1663238Z  * [new tag]                 v2.8.0-rc5                  -> v2.8.0-rc5
2025-12-04T09:17:19.1664758Z  * [new tag]                 v2.8.0-rc6                  -> v2.8.0-rc6
2025-12-04T09:17:19.1666309Z  * [new tag]                 v2.8.0-rc7                  -> v2.8.0-rc7
2025-12-04T09:17:19.1667772Z  * [new tag]                 v2.8.0-rc8                  -> v2.8.0-rc8
2025-12-04T09:17:19.1669363Z  * [new tag]                 v2.9.0                      -> v2.9.0
2025-12-04T09:17:19.1670845Z  * [new tag]                 v2.9.0-rc1                  -> v2.9.0-rc1
2025-12-04T09:17:19.1672438Z  * [new tag]                 v2.9.0-rc10                 -> v2.9.0-rc10
2025-12-04T09:17:19.1673893Z  * [new tag]                 v2.9.0-rc11                 -> v2.9.0-rc11
2025-12-04T09:17:19.1675631Z  * [new tag]                 v2.9.0-rc2                  -> v2.9.0-rc2
2025-12-04T09:17:19.1677174Z  * [new tag]                 v2.9.0-rc3                  -> v2.9.0-rc3
2025-12-04T09:17:19.1678692Z  * [new tag]                 v2.9.0-rc4                  -> v2.9.0-rc4
2025-12-04T09:17:19.1680211Z  * [new tag]                 v2.9.0-rc5                  -> v2.9.0-rc5
2025-12-04T09:17:19.1681974Z  * [new tag]                 v2.9.0-rc6                  -> v2.9.0-rc6
2025-12-04T09:17:19.1683460Z  * [new tag]                 v2.9.0-rc7                  -> v2.9.0-rc7
2025-12-04T09:17:19.1685162Z  * [new tag]                 v2.9.0-rc8                  -> v2.9.0-rc8
2025-12-04T09:17:19.1686439Z  * [new tag]                 v2.9.0-rc9                  -> v2.9.0-rc9
2025-12-04T09:17:19.1687744Z  * [new tag]                 v2.9.1                      -> v2.9.1
2025-12-04T09:17:19.1689221Z  * [new tag]                 v2.9.1-rc1                  -> v2.9.1-rc1
2025-12-04T09:17:19.1690780Z  * [new tag]                 v2.9.1-rc2                  -> v2.9.1-rc2
2025-12-04T09:17:19.1693396Z  * [new tag]                 viable/strict/1759343184    -> viable/strict/1759343184
2025-12-04T09:17:19.1694842Z  * [new tag]                 viable/strict/1759346540    -> viable/strict/1759346540
2025-12-04T09:17:19.1696190Z  * [new tag]                 viable/strict/1759348181    -> viable/strict/1759348181
2025-12-04T09:17:19.1697606Z  * [new tag]                 viable/strict/1759350324    -> viable/strict/1759350324
2025-12-04T09:17:19.1699099Z  * [new tag]                 viable/strict/1759351793    -> viable/strict/1759351793
2025-12-04T09:17:19.1700621Z  * [new tag]                 viable/strict/1759353844    -> viable/strict/1759353844
2025-12-04T09:17:19.1701964Z  * [new tag]                 viable/strict/1759355374    -> viable/strict/1759355374
2025-12-04T09:17:19.1703352Z  * [new tag]                 viable/strict/1759357472    -> viable/strict/1759357472
2025-12-04T09:17:19.1704728Z  * [new tag]                 viable/strict/1759361002    -> viable/strict/1759361002
2025-12-04T09:17:19.1706569Z  * [new tag]                 viable/strict/1759362585    -> viable/strict/1759362585
2025-12-04T09:17:19.1708240Z  * [new tag]                 viable/strict/1759365359    -> viable/strict/1759365359
2025-12-04T09:17:19.1709991Z  * [new tag]                 viable/strict/1759370089    -> viable/strict/1759370089
2025-12-04T09:17:19.1711478Z  * [new tag]                 viable/strict/1759377554    -> viable/strict/1759377554
2025-12-04T09:17:19.1713001Z  * [new tag]                 viable/strict/1759379133    -> viable/strict/1759379133
2025-12-04T09:17:19.1714446Z  * [new tag]                 viable/strict/1759389871    -> viable/strict/1759389871
2025-12-04T09:17:19.1716023Z  * [new tag]                 viable/strict/1759393562    -> viable/strict/1759393562
2025-12-04T09:17:19.1717512Z  * [new tag]                 viable/strict/1759395076    -> viable/strict/1759395076
2025-12-04T09:17:19.1719015Z  * [new tag]                 viable/strict/1759398579    -> viable/strict/1759398579
2025-12-04T09:17:19.1720521Z  * [new tag]                 viable/strict/1759404142    -> viable/strict/1759404142
2025-12-04T09:17:19.1721961Z  * [new tag]                 viable/strict/1759405773    -> viable/strict/1759405773
2025-12-04T09:17:19.1723472Z  * [new tag]                 viable/strict/1759408041    -> viable/strict/1759408041
2025-12-04T09:17:19.1724948Z  * [new tag]                 viable/strict/1759411593    -> viable/strict/1759411593
2025-12-04T09:17:19.1726372Z  * [new tag]                 viable/strict/1759427395    -> viable/strict/1759427395
2025-12-04T09:17:19.1727832Z  * [new tag]                 viable/strict/1759434582    -> viable/strict/1759434582
2025-12-04T09:17:19.1729353Z  * [new tag]                 viable/strict/1759436720    -> viable/strict/1759436720
2025-12-04T09:17:19.1730966Z  * [new tag]                 viable/strict/1759440219    -> viable/strict/1759440219
2025-12-04T09:17:19.1732368Z  * [new tag]                 viable/strict/1759441948    -> viable/strict/1759441948
2025-12-04T09:17:19.1733836Z  * [new tag]                 viable/strict/1759443860    -> viable/strict/1759443860
2025-12-04T09:17:19.1735365Z  * [new tag]                 viable/strict/1759445377    -> viable/strict/1759445377
2025-12-04T09:17:19.1736866Z  * [new tag]                 viable/strict/1759447415    -> viable/strict/1759447415
2025-12-04T09:17:19.1744420Z  * [new tag]                 viable/strict/1759451750    -> viable/strict/1759451750
2025-12-04T09:17:19.1744769Z  * [new tag]                 viable/strict/1759453910    -> viable/strict/1759453910
2025-12-04T09:17:19.1744962Z  * [new tag]                 viable/strict/1759456483    -> viable/strict/1759456483
2025-12-04T09:17:19.1745147Z  * [new tag]                 viable/strict/1759459279    -> viable/strict/1759459279
2025-12-04T09:17:19.1745335Z  * [new tag]                 viable/strict/1759460742    -> viable/strict/1759460742
2025-12-04T09:17:19.1745771Z  * [new tag]                 viable/strict/1759462025    -> viable/strict/1759462025
2025-12-04T09:17:19.1747880Z  * [new tag]                 viable/strict/1759469086    -> viable/strict/1759469086
2025-12-04T09:17:19.1748826Z  * [new tag]                 viable/strict/1759470581    -> viable/strict/1759470581
2025-12-04T09:17:19.1750508Z  * [new tag]                 viable/strict/1759472786    -> viable/strict/1759472786
2025-12-04T09:17:19.1751993Z  * [new tag]                 viable/strict/1759476294    -> viable/strict/1759476294
2025-12-04T09:17:19.1753480Z  * [new tag]                 viable/strict/1759479963    -> viable/strict/1759479963
2025-12-04T09:17:19.1754939Z  * [new tag]                 viable/strict/1759492177    -> viable/strict/1759492177
2025-12-04T09:17:19.1756377Z  * [new tag]                 viable/strict/1759519278    -> viable/strict/1759519278
2025-12-04T09:17:19.1757846Z  * [new tag]                 viable/strict/1759524580    -> viable/strict/1759524580
2025-12-04T09:17:19.1759270Z  * [new tag]                 viable/strict/1759528193    -> viable/strict/1759528193
2025-12-04T09:17:19.1760946Z  * [new tag]                 viable/strict/1759533797    -> viable/strict/1759533797
2025-12-04T09:17:19.1762463Z  * [new tag]                 viable/strict/1759542780    -> viable/strict/1759542780
2025-12-04T09:17:19.1763954Z  * [new tag]                 viable/strict/1759549779    -> viable/strict/1759549779
2025-12-04T09:17:19.1765445Z  * [new tag]                 viable/strict/1759555455    -> viable/strict/1759555455
2025-12-04T09:17:19.1766921Z  * [new tag]                 viable/strict/1759559176    -> viable/strict/1759559176
2025-12-04T09:17:19.1768406Z  * [new tag]                 viable/strict/1759560629    -> viable/strict/1759560629
2025-12-04T09:17:19.1769867Z  * [new tag]                 viable/strict/1759569848    -> viable/strict/1759569848
2025-12-04T09:17:19.1771599Z  * [new tag]                 viable/strict/1759571382    -> viable/strict/1759571382
2025-12-04T09:17:19.1773001Z  * [new tag]                 viable/strict/1759573474    -> viable/strict/1759573474
2025-12-04T09:17:19.1774460Z  * [new tag]                 viable/strict/1759618187    -> viable/strict/1759618187
2025-12-04T09:17:19.1775976Z  * [new tag]                 viable/strict/1759626742    -> viable/strict/1759626742
2025-12-04T09:17:19.1777536Z  * [new tag]                 viable/strict/1759632427    -> viable/strict/1759632427
2025-12-04T09:17:19.1779051Z  * [new tag]                 viable/strict/1759634971    -> viable/strict/1759634971
2025-12-04T09:17:19.1780693Z  * [new tag]                 viable/strict/1759661382    -> viable/strict/1759661382
2025-12-04T09:17:19.1782236Z  * [new tag]                 viable/strict/1759663294    -> viable/strict/1759663294
2025-12-04T09:17:19.1783539Z  * [new tag]                 viable/strict/1759708178    -> viable/strict/1759708178
2025-12-04T09:17:19.1785131Z  * [new tag]                 viable/strict/1759715695    -> viable/strict/1759715695
2025-12-04T09:17:19.1786658Z  * [new tag]                 viable/strict/1759728293    -> viable/strict/1759728293
2025-12-04T09:17:19.1788660Z  * [new tag]                 viable/strict/1759735513    -> viable/strict/1759735513
2025-12-04T09:17:19.1790278Z  * [new tag]                 viable/strict/1759739177    -> viable/strict/1759739177
2025-12-04T09:17:19.1791718Z  * [new tag]                 viable/strict/1759758635    -> viable/strict/1759758635
2025-12-04T09:17:19.1793205Z  * [new tag]                 viable/strict/1759765784    -> viable/strict/1759765784
2025-12-04T09:17:19.1794693Z  * [new tag]                 viable/strict/1759767948    -> viable/strict/1759767948
2025-12-04T09:17:19.1796230Z  * [new tag]                 viable/strict/1759771461    -> viable/strict/1759771461
2025-12-04T09:17:19.1797565Z  * [new tag]                 viable/strict/1759776706    -> viable/strict/1759776706
2025-12-04T09:17:19.1799137Z  * [new tag]                 viable/strict/1759782317    -> viable/strict/1759782317
2025-12-04T09:17:19.1800688Z  * [new tag]                 viable/strict/1759783777    -> viable/strict/1759783777
2025-12-04T09:17:19.1802258Z  * [new tag]                 viable/strict/1759785815    -> viable/strict/1759785815
2025-12-04T09:17:19.1803708Z  * [new tag]                 viable/strict/1759789459    -> viable/strict/1759789459
2025-12-04T09:17:19.1805248Z  * [new tag]                 viable/strict/1759790974    -> viable/strict/1759790974
2025-12-04T09:17:19.1806600Z  * [new tag]                 viable/strict/1759794583    -> viable/strict/1759794583
2025-12-04T09:17:19.1808132Z  * [new tag]                 viable/strict/1759797408    -> viable/strict/1759797408
2025-12-04T09:17:19.1811989Z  * [new tag]                 viable/strict/1759799518    -> viable/strict/1759799518
2025-12-04T09:17:19.1813463Z  * [new tag]                 viable/strict/1759804909    -> viable/strict/1759804909
2025-12-04T09:17:19.1814965Z  * [new tag]                 viable/strict/1759807643    -> viable/strict/1759807643
2025-12-04T09:17:19.1816461Z  * [new tag]                 viable/strict/1759809089    -> viable/strict/1759809089
2025-12-04T09:17:19.1817929Z  * [new tag]                 viable/strict/1759811145    -> viable/strict/1759811145
2025-12-04T09:17:19.1819593Z  * [new tag]                 viable/strict/1759812581    -> viable/strict/1759812581
2025-12-04T09:17:19.1821126Z  * [new tag]                 viable/strict/1759814683    -> viable/strict/1759814683
2025-12-04T09:17:19.1822615Z  * [new tag]                 viable/strict/1759821889    -> viable/strict/1759821889
2025-12-04T09:17:19.1824120Z  * [new tag]                 viable/strict/1759823376    -> viable/strict/1759823376
2025-12-04T09:17:19.1825595Z  * [new tag]                 viable/strict/1759827107    -> viable/strict/1759827107
2025-12-04T09:17:19.1827064Z  * [new tag]                 viable/strict/1759830577    -> viable/strict/1759830577
2025-12-04T09:17:19.1828687Z  * [new tag]                 viable/strict/1759832720    -> viable/strict/1759832720
2025-12-04T09:17:19.1830160Z  * [new tag]                 viable/strict/1759842063    -> viable/strict/1759842063
2025-12-04T09:17:19.1831622Z  * [new tag]                 viable/strict/1759847121    -> viable/strict/1759847121
2025-12-04T09:17:19.1833450Z  * [new tag]                 viable/strict/1759850721    -> viable/strict/1759850721
2025-12-04T09:17:19.1835123Z  * [new tag]                 viable/strict/1759857870    -> viable/strict/1759857870
2025-12-04T09:17:19.1836634Z  * [new tag]                 viable/strict/1759863143    -> viable/strict/1759863143
2025-12-04T09:17:19.1838183Z  * [new tag]                 viable/strict/1759875874    -> viable/strict/1759875874
2025-12-04T09:17:19.1839807Z  * [new tag]                 viable/strict/1759877385    -> viable/strict/1759877385
2025-12-04T09:17:19.1841300Z  * [new tag]                 viable/strict/1759883801    -> viable/strict/1759883801
2025-12-04T09:17:19.1842958Z  * [new tag]                 viable/strict/1759885922    -> viable/strict/1759885922
2025-12-04T09:17:19.1844323Z  * [new tag]                 viable/strict/1759888488    -> viable/strict/1759888488
2025-12-04T09:17:19.1845808Z  * [new tag]                 viable/strict/1759895471    -> viable/strict/1759895471
2025-12-04T09:17:19.1847329Z  * [new tag]                 viable/strict/1759904803    -> viable/strict/1759904803
2025-12-04T09:17:19.1848995Z  * [new tag]                 viable/strict/1759908300    -> viable/strict/1759908300
2025-12-04T09:17:19.1850536Z  * [new tag]                 viable/strict/1759915520    -> viable/strict/1759915520
2025-12-04T09:17:19.1852061Z  * [new tag]                 viable/strict/1759916978    -> viable/strict/1759916978
2025-12-04T09:17:19.1853396Z  * [new tag]                 viable/strict/1759930024    -> viable/strict/1759930024
2025-12-04T09:17:19.1854906Z  * [new tag]                 viable/strict/1759948122    -> viable/strict/1759948122
2025-12-04T09:17:19.1856411Z  * [new tag]                 viable/strict/1759952983    -> viable/strict/1759952983
2025-12-04T09:17:19.1857933Z  * [new tag]                 viable/strict/1759955121    -> viable/strict/1759955121
2025-12-04T09:17:19.1859508Z  * [new tag]                 viable/strict/1759962298    -> viable/strict/1759962298
2025-12-04T09:17:19.1861046Z  * [new tag]                 viable/strict/1759965837    -> viable/strict/1759965837
2025-12-04T09:17:19.1862602Z  * [new tag]                 viable/strict/1759970213    -> viable/strict/1759970213
2025-12-04T09:17:19.1864113Z  * [new tag]                 viable/strict/1759974894    -> viable/strict/1759974894
2025-12-04T09:17:19.1865583Z  * [new tag]                 viable/strict/1759977763    -> viable/strict/1759977763
2025-12-04T09:17:19.1867099Z  * [new tag]                 viable/strict/1759979241    -> viable/strict/1759979241
2025-12-04T09:17:19.1868644Z  * [new tag]                 viable/strict/1759985417    -> viable/strict/1759985417
2025-12-04T09:17:19.1870144Z  * [new tag]                 viable/strict/1759987490    -> viable/strict/1759987490
2025-12-04T09:17:19.1871635Z  * [new tag]                 viable/strict/1759996180    -> viable/strict/1759996180
2025-12-04T09:17:19.1873108Z  * [new tag]                 viable/strict/1760065682    -> viable/strict/1760065682
2025-12-04T09:17:19.1874610Z  * [new tag]                 viable/strict/1760066894    -> viable/strict/1760066894
2025-12-04T09:17:19.1876203Z  * [new tag]                 viable/strict/1760070345    -> viable/strict/1760070345
2025-12-04T09:17:19.1877701Z  * [new tag]                 viable/strict/1760089782    -> viable/strict/1760089782
2025-12-04T09:17:19.1879189Z  * [new tag]                 viable/strict/1760091921    -> viable/strict/1760091921
2025-12-04T09:17:19.1880655Z  * [new tag]                 viable/strict/1760127924    -> viable/strict/1760127924
2025-12-04T09:17:19.1882156Z  * [new tag]                 viable/strict/1760129489    -> viable/strict/1760129489
2025-12-04T09:17:19.1883705Z  * [new tag]                 viable/strict/1760132980    -> viable/strict/1760132980
2025-12-04T09:17:19.1885339Z  * [new tag]                 viable/strict/1760135060    -> viable/strict/1760135060
2025-12-04T09:17:19.1886920Z  * [new tag]                 viable/strict/1760215782    -> viable/strict/1760215782
2025-12-04T09:17:19.1888925Z  * [new tag]                 viable/strict/1760273849    -> viable/strict/1760273849
2025-12-04T09:17:19.1890420Z  * [new tag]                 viable/strict/1760275517    -> viable/strict/1760275517
2025-12-04T09:17:19.1891896Z  * [new tag]                 viable/strict/1760276979    -> viable/strict/1760276979
2025-12-04T09:17:19.1893452Z  * [new tag]                 viable/strict/1760279007    -> viable/strict/1760279007
2025-12-04T09:17:19.1894809Z  * [new tag]                 viable/strict/1760286328    -> viable/strict/1760286328
2025-12-04T09:17:19.1896152Z  * [new tag]                 viable/strict/1760493304    -> viable/strict/1760493304
2025-12-04T09:17:19.1897746Z  * [new tag]                 viable/strict/1760496298    -> viable/strict/1760496298
2025-12-04T09:17:19.1899197Z  * [new tag]                 viable/strict/1760518396    -> viable/strict/1760518396
2025-12-04T09:17:19.1900723Z  * [new tag]                 viable/strict/1760534864    -> viable/strict/1760534864
2025-12-04T09:17:19.1902238Z  * [new tag]                 viable/strict/1760549062    -> viable/strict/1760549062
2025-12-04T09:17:19.1903817Z  * [new tag]                 viable/strict/1760552799    -> viable/strict/1760552799
2025-12-04T09:17:19.1905378Z  * [new tag]                 viable/strict/1760554355    -> viable/strict/1760554355
2025-12-04T09:17:19.1906936Z  * [new tag]                 viable/strict/1760556275    -> viable/strict/1760556275
2025-12-04T09:17:19.1908427Z  * [new tag]                 viable/strict/1760564979    -> viable/strict/1760564979
2025-12-04T09:17:19.1910179Z  * [new tag]                 viable/strict/1760567049    -> viable/strict/1760567049
2025-12-04T09:17:19.1912074Z  * [new tag]                 viable/strict/1760568585    -> viable/strict/1760568585
2025-12-04T09:17:19.1913546Z  * [new tag]                 viable/strict/1760570630    -> viable/strict/1760570630
2025-12-04T09:17:19.1915021Z  * [new tag]                 viable/strict/1760572180    -> viable/strict/1760572180
2025-12-04T09:17:19.1916564Z  * [new tag]                 viable/strict/1760575094    -> viable/strict/1760575094
2025-12-04T09:17:19.1918150Z  * [new tag]                 viable/strict/1760579709    -> viable/strict/1760579709
2025-12-04T09:17:19.1920170Z  * [new tag]                 viable/strict/1760582614    -> viable/strict/1760582614
2025-12-04T09:17:19.1921626Z  * [new tag]                 viable/strict/1760586815    -> viable/strict/1760586815
2025-12-04T09:17:19.1922992Z  * [new tag]                 viable/strict/1760588829    -> viable/strict/1760588829
2025-12-04T09:17:19.1924531Z  * [new tag]                 viable/strict/1760590200    -> viable/strict/1760590200
2025-12-04T09:17:19.1926092Z  * [new tag]                 viable/strict/1760592311    -> viable/strict/1760592311
2025-12-04T09:17:19.1927556Z  * [new tag]                 viable/strict/1760619733    -> viable/strict/1760619733
2025-12-04T09:17:19.1928886Z  * [new tag]                 viable/strict/1760628335    -> viable/strict/1760628335
2025-12-04T09:17:19.1930381Z  * [new tag]                 viable/strict/1760635490    -> viable/strict/1760635490
2025-12-04T09:17:19.1931838Z  * [new tag]                 viable/strict/1760640743    -> viable/strict/1760640743
2025-12-04T09:17:19.1933381Z  * [new tag]                 viable/strict/1760642528    -> viable/strict/1760642528
2025-12-04T09:17:19.1934851Z  * [new tag]                 viable/strict/1760646330    -> viable/strict/1760646330
2025-12-04T09:17:19.1936517Z  * [new tag]                 viable/strict/1760666101    -> viable/strict/1760666101
2025-12-04T09:17:19.1937959Z  * [new tag]                 viable/strict/1760668990    -> viable/strict/1760668990
2025-12-04T09:17:19.1939527Z  * [new tag]                 viable/strict/1760670600    -> viable/strict/1760670600
2025-12-04T09:17:19.1941120Z  * [new tag]                 viable/strict/1760671704    -> viable/strict/1760671704
2025-12-04T09:17:19.1942640Z  * [new tag]                 viable/strict/1760673121    -> viable/strict/1760673121
2025-12-04T09:17:19.1944104Z  * [new tag]                 viable/strict/1760675352    -> viable/strict/1760675352
2025-12-04T09:17:19.1945594Z  * [new tag]                 viable/strict/1760696731    -> viable/strict/1760696731
2025-12-04T09:17:19.1948511Z  * [new tag]                 viable/strict/1760723515    -> viable/strict/1760723515
2025-12-04T09:17:19.1949993Z  * [new tag]                 viable/strict/1760727234    -> viable/strict/1760727234
2025-12-04T09:17:19.1951513Z  * [new tag]                 viable/strict/1760730578    -> viable/strict/1760730578
2025-12-04T09:17:19.1953023Z  * [new tag]                 viable/strict/1760732726    -> viable/strict/1760732726
2025-12-04T09:17:19.1954700Z  * [new tag]                 viable/strict/1760734180    -> viable/strict/1760734180
2025-12-04T09:17:19.1956104Z  * [new tag]                 viable/strict/1760736251    -> viable/strict/1760736251
2025-12-04T09:17:19.1957572Z  * [new tag]                 viable/strict/1760737772    -> viable/strict/1760737772
2025-12-04T09:17:19.1959136Z  * [new tag]                 viable/strict/1760758005    -> viable/strict/1760758005
2025-12-04T09:17:19.1960586Z  * [new tag]                 viable/strict/1760761532    -> viable/strict/1760761532
2025-12-04T09:17:19.1962136Z  * [new tag]                 viable/strict/1760802581    -> viable/strict/1760802581
2025-12-04T09:17:19.1963588Z  * [new tag]                 viable/strict/1760827772    -> viable/strict/1760827772
2025-12-04T09:17:19.1965078Z  * [new tag]                 viable/strict/1760834524    -> viable/strict/1760834524
2025-12-04T09:17:19.1966667Z  * [new tag]                 viable/strict/1760845009    -> viable/strict/1760845009
2025-12-04T09:17:19.1968229Z  * [new tag]                 viable/strict/1760876836    -> viable/strict/1760876836
2025-12-04T09:17:19.1969730Z  * [new tag]                 viable/strict/1760880329    -> viable/strict/1760880329
2025-12-04T09:17:19.1971192Z  * [new tag]                 viable/strict/1760888987    -> viable/strict/1760888987
2025-12-04T09:17:19.1972651Z  * [new tag]                 viable/strict/1760912664    -> viable/strict/1760912664
2025-12-04T09:17:19.1974241Z  * [new tag]                 viable/strict/1760925321    -> viable/strict/1760925321
2025-12-04T09:17:19.1975673Z  * [new tag]                 viable/strict/1760931488    -> viable/strict/1760931488
2025-12-04T09:17:19.1977177Z  * [new tag]                 viable/strict/1760932693    -> viable/strict/1760932693
2025-12-04T09:17:19.1978687Z  * [new tag]                 viable/strict/1761004184    -> viable/strict/1761004184
2025-12-04T09:17:19.1980369Z  * [new tag]                 viable/strict/1761014748    -> viable/strict/1761014748
2025-12-04T09:17:19.1981872Z  * [new tag]                 viable/strict/1761017491    -> viable/strict/1761017491
2025-12-04T09:17:19.1983398Z  * [new tag]                 viable/strict/1761018806    -> viable/strict/1761018806
2025-12-04T09:17:19.1984981Z  * [new tag]                 viable/strict/1761020754    -> viable/strict/1761020754
2025-12-04T09:17:19.1986534Z  * [new tag]                 viable/strict/1761024303    -> viable/strict/1761024303
2025-12-04T09:17:19.1988451Z  * [new tag]                 viable/strict/1761029582    -> viable/strict/1761029582
2025-12-04T09:17:19.1989965Z  * [new tag]                 viable/strict/1761031535    -> viable/strict/1761031535
2025-12-04T09:17:19.1991448Z  * [new tag]                 viable/strict/1761035196    -> viable/strict/1761035196
2025-12-04T09:17:19.1992992Z  * [new tag]                 viable/strict/1761045825    -> viable/strict/1761045825
2025-12-04T09:17:19.1994508Z  * [new tag]                 viable/strict/1761054796    -> viable/strict/1761054796
2025-12-04T09:17:19.1996110Z  * [new tag]                 viable/strict/1761060314    -> viable/strict/1761060314
2025-12-04T09:17:19.1997650Z  * [new tag]                 viable/strict/1761071198    -> viable/strict/1761071198
2025-12-04T09:17:19.1999182Z  * [new tag]                 viable/strict/1761074628    -> viable/strict/1761074628
2025-12-04T09:17:19.2000688Z  * [new tag]                 viable/strict/1761078351    -> viable/strict/1761078351
2025-12-04T09:17:19.2002220Z  * [new tag]                 viable/strict/1761079822    -> viable/strict/1761079822
2025-12-04T09:17:19.2003716Z  * [new tag]                 viable/strict/1761081873    -> viable/strict/1761081873
2025-12-04T09:17:19.2005172Z  * [new tag]                 viable/strict/1761083392    -> viable/strict/1761083392
2025-12-04T09:17:19.2006684Z  * [new tag]                 viable/strict/1761085465    -> viable/strict/1761085465
2025-12-04T09:17:19.2008423Z  * [new tag]                 viable/strict/1761089099    -> viable/strict/1761089099
2025-12-04T09:17:19.2010168Z  * [new tag]                 viable/strict/1761095535    -> viable/strict/1761095535
2025-12-04T09:17:19.2011306Z  * [new tag]                 viable/strict/1761098119    -> viable/strict/1761098119
2025-12-04T09:17:19.2013346Z  * [new tag]                 viable/strict/1761101330    -> viable/strict/1761101330
2025-12-04T09:17:19.2014857Z  * [new tag]                 viable/strict/1761114425    -> viable/strict/1761114425
2025-12-04T09:17:19.2016333Z  * [new tag]                 viable/strict/1761116036    -> viable/strict/1761116036
2025-12-04T09:17:19.2017835Z  * [new tag]                 viable/strict/1761119379    -> viable/strict/1761119379
2025-12-04T09:17:19.2019359Z  * [new tag]                 viable/strict/1761121601    -> viable/strict/1761121601
2025-12-04T09:17:19.2020969Z  * [new tag]                 viable/strict/1761123234    -> viable/strict/1761123234
2025-12-04T09:17:19.2022558Z  * [new tag]                 viable/strict/1761126621    -> viable/strict/1761126621
2025-12-04T09:17:19.2023969Z  * [new tag]                 viable/strict/1761132259    -> viable/strict/1761132259
2025-12-04T09:17:19.2025537Z  * [new tag]                 viable/strict/1761146746    -> viable/strict/1761146746
2025-12-04T09:17:19.2027033Z  * [new tag]                 viable/strict/1761164752    -> viable/strict/1761164752
2025-12-04T09:17:19.2028523Z  * [new tag]                 viable/strict/1761166198    -> viable/strict/1761166198
2025-12-04T09:17:19.2030046Z  * [new tag]                 viable/strict/1761175424    -> viable/strict/1761175424
2025-12-04T09:17:19.2031562Z  * [new tag]                 viable/strict/1761176983    -> viable/strict/1761176983
2025-12-04T09:17:19.2033186Z  * [new tag]                 viable/strict/1761179891    -> viable/strict/1761179891
2025-12-04T09:17:19.2034778Z  * [new tag]                 viable/strict/1761181930    -> viable/strict/1761181930
2025-12-04T09:17:19.2036419Z  * [new tag]                 viable/strict/1761184516    -> viable/strict/1761184516
2025-12-04T09:17:19.2038018Z  * [new tag]                 viable/strict/1761190179    -> viable/strict/1761190179
2025-12-04T09:17:19.2039528Z  * [new tag]                 viable/strict/1761193558    -> viable/strict/1761193558
2025-12-04T09:17:19.2041049Z  * [new tag]                 viable/strict/1761207990    -> viable/strict/1761207990
2025-12-04T09:17:19.2042496Z  * [new tag]                 viable/strict/1761229539    -> viable/strict/1761229539
2025-12-04T09:17:19.2044197Z  * [new tag]                 viable/strict/1761244031    -> viable/strict/1761244031
2025-12-04T09:17:19.2045703Z  * [new tag]                 viable/strict/1761248986    -> viable/strict/1761248986
2025-12-04T09:17:19.2047270Z  * [new tag]                 viable/strict/1761259791    -> viable/strict/1761259791
2025-12-04T09:17:19.2048734Z  * [new tag]                 viable/strict/1761266139    -> viable/strict/1761266139
2025-12-04T09:17:19.2050262Z  * [new tag]                 viable/strict/1761268316    -> viable/strict/1761268316
2025-12-04T09:17:19.2051746Z  * [new tag]                 viable/strict/1761273805    -> viable/strict/1761273805
2025-12-04T09:17:19.2053260Z  * [new tag]                 viable/strict/1761275261    -> viable/strict/1761275261
2025-12-04T09:17:19.2054785Z  * [new tag]                 viable/strict/1761277913    -> viable/strict/1761277913
2025-12-04T09:17:19.2056377Z  * [new tag]                 viable/strict/1761290701    -> viable/strict/1761290701
2025-12-04T09:17:19.2057891Z  * [new tag]                 viable/strict/1761294396    -> viable/strict/1761294396
2025-12-04T09:17:19.2059510Z  * [new tag]                 viable/strict/1761303047    -> viable/strict/1761303047
2025-12-04T09:17:19.2060989Z  * [new tag]                 viable/strict/1761335388    -> viable/strict/1761335388
2025-12-04T09:17:19.2062543Z  * [new tag]                 viable/strict/1761337551    -> viable/strict/1761337551
2025-12-04T09:17:19.2064135Z  * [new tag]                 viable/strict/1761339007    -> viable/strict/1761339007
2025-12-04T09:17:19.2065607Z  * [new tag]                 viable/strict/1761341050    -> viable/strict/1761341050
2025-12-04T09:17:19.2067048Z  * [new tag]                 viable/strict/1761346188    -> viable/strict/1761346188
2025-12-04T09:17:19.2068609Z  * [new tag]                 viable/strict/1761349792    -> viable/strict/1761349792
2025-12-04T09:17:19.2070216Z  * [new tag]                 viable/strict/1761352620    -> viable/strict/1761352620
2025-12-04T09:17:19.2071722Z  * [new tag]                 viable/strict/1761354730    -> viable/strict/1761354730
2025-12-04T09:17:19.2073253Z  * [new tag]                 viable/strict/1761357298    -> viable/strict/1761357298
2025-12-04T09:17:19.2074758Z  * [new tag]                 viable/strict/1761360201    -> viable/strict/1761360201
2025-12-04T09:17:19.2076279Z  * [new tag]                 viable/strict/1761361753    -> viable/strict/1761361753
2025-12-04T09:17:19.2077795Z  * [new tag]                 viable/strict/1761364351    -> viable/strict/1761364351
2025-12-04T09:17:19.2079351Z  * [new tag]                 viable/strict/1761366338    -> viable/strict/1761366338
2025-12-04T09:17:19.2080957Z  * [new tag]                 viable/strict/1761367802    -> viable/strict/1761367802
2025-12-04T09:17:19.2082438Z  * [new tag]                 viable/strict/1761369889    -> viable/strict/1761369889
2025-12-04T09:17:19.2084009Z  * [new tag]                 viable/strict/1761371385    -> viable/strict/1761371385
2025-12-04T09:17:19.2085888Z  * [new tag]                 viable/strict/1761373581    -> viable/strict/1761373581
2025-12-04T09:17:19.2088036Z  * [new tag]                 viable/strict/1761375054    -> viable/strict/1761375054
2025-12-04T09:17:19.2089584Z  * [new tag]                 viable/strict/1761421785    -> viable/strict/1761421785
2025-12-04T09:17:19.2091167Z  * [new tag]                 viable/strict/1761434614    -> viable/strict/1761434614
2025-12-04T09:17:19.2093034Z  * [new tag]                 viable/strict/1761439254    -> viable/strict/1761439254
2025-12-04T09:17:19.2094587Z  * [new tag]                 viable/strict/1761454187    -> viable/strict/1761454187
2025-12-04T09:17:19.2096278Z  * [new tag]                 viable/strict/1761459991    -> viable/strict/1761459991
2025-12-04T09:17:19.2098053Z  * [new tag]                 viable/strict/1761470668    -> viable/strict/1761470668
2025-12-04T09:17:19.2100115Z  * [new tag]                 viable/strict/1761472188    -> viable/strict/1761472188
2025-12-04T09:17:19.2101608Z  * [new tag]                 viable/strict/1761503178    -> viable/strict/1761503178
2025-12-04T09:17:19.2103116Z  * [new tag]                 viable/strict/1761517492    -> viable/strict/1761517492
2025-12-04T09:17:19.2104636Z  * [new tag]                 viable/strict/1761518981    -> viable/strict/1761518981
2025-12-04T09:17:19.2106257Z  * [new tag]                 viable/strict/1761533609    -> viable/strict/1761533609
2025-12-04T09:17:19.2107616Z  * [new tag]                 viable/strict/1761546438    -> viable/strict/1761546438
2025-12-04T09:17:19.2109456Z  * [new tag]                 viable/strict/1761548133    -> viable/strict/1761548133
2025-12-04T09:17:19.2111245Z  * [new tag]                 viable/strict/1761555186    -> viable/strict/1761555186
2025-12-04T09:17:19.2112910Z  * [new tag]                 viable/strict/1761557178    -> viable/strict/1761557178
2025-12-04T09:17:19.2114425Z  * [new tag]                 viable/strict/1761560772    -> viable/strict/1761560772
2025-12-04T09:17:19.2115994Z  * [new tag]                 viable/strict/1761562266    -> viable/strict/1761562266
2025-12-04T09:17:19.2117559Z  * [new tag]                 viable/strict/1761564260    -> viable/strict/1761564260
2025-12-04T09:17:19.2119051Z  * [new tag]                 viable/strict/1761568072    -> viable/strict/1761568072
2025-12-04T09:17:19.2120585Z  * [new tag]                 viable/strict/1761571683    -> viable/strict/1761571683
2025-12-04T09:17:19.2122153Z  * [new tag]                 viable/strict/1761580199    -> viable/strict/1761580199
2025-12-04T09:17:19.2123536Z  * [new tag]                 viable/strict/1761587383    -> viable/strict/1761587383
2025-12-04T09:17:19.2125119Z  * [new tag]                 viable/strict/1761591165    -> viable/strict/1761591165
2025-12-04T09:17:19.2126627Z  * [new tag]                 viable/strict/1761594575    -> viable/strict/1761594575
2025-12-04T09:17:19.2128205Z  * [new tag]                 viable/strict/1761596710    -> viable/strict/1761596710
2025-12-04T09:17:19.2129714Z  * [new tag]                 viable/strict/1761598189    -> viable/strict/1761598189
2025-12-04T09:17:19.2131225Z  * [new tag]                 viable/strict/1761600254    -> viable/strict/1761600254
2025-12-04T09:17:19.2132756Z  * [new tag]                 viable/strict/1761603879    -> viable/strict/1761603879
2025-12-04T09:17:19.2134326Z  * [new tag]                 viable/strict/1761605429    -> viable/strict/1761605429
2025-12-04T09:17:19.2135944Z  * [new tag]                 viable/strict/1761607468    -> viable/strict/1761607468
2025-12-04T09:17:19.2137554Z  * [new tag]                 viable/strict/1761608983    -> viable/strict/1761608983
2025-12-04T09:17:19.2139138Z  * [new tag]                 viable/strict/1761611846    -> viable/strict/1761611846
2025-12-04T09:17:19.2140815Z  * [new tag]                 viable/strict/1761613922    -> viable/strict/1761613922
2025-12-04T09:17:19.2142175Z  * [new tag]                 viable/strict/1761616504    -> viable/strict/1761616504
2025-12-04T09:17:19.2143505Z  * [new tag]                 viable/strict/1761619599    -> viable/strict/1761619599
2025-12-04T09:17:19.2145113Z  * [new tag]                 viable/strict/1761686693    -> viable/strict/1761686693
2025-12-04T09:17:19.2146686Z  * [new tag]                 viable/strict/1761688179    -> viable/strict/1761688179
2025-12-04T09:17:19.2148176Z  * [new tag]                 viable/strict/1761691973    -> viable/strict/1761691973
2025-12-04T09:17:19.2149798Z  * [new tag]                 viable/strict/1761693884    -> viable/strict/1761693884
2025-12-04T09:17:19.2151402Z  * [new tag]                 viable/strict/1761695389    -> viable/strict/1761695389
2025-12-04T09:17:19.2152984Z  * [new tag]                 viable/strict/1761698408    -> viable/strict/1761698408
2025-12-04T09:17:19.2154445Z  * [new tag]                 viable/strict/1761702931    -> viable/strict/1761702931
2025-12-04T09:17:19.2156028Z  * [new tag]                 viable/strict/1761706307    -> viable/strict/1761706307
2025-12-04T09:17:19.2157605Z  * [new tag]                 viable/strict/1761709065    -> viable/strict/1761709065
2025-12-04T09:17:19.2159297Z  * [new tag]                 viable/strict/1761710285    -> viable/strict/1761710285
2025-12-04T09:17:19.2160866Z  * [new tag]                 viable/strict/1761711983    -> viable/strict/1761711983
2025-12-04T09:17:19.2162472Z  * [new tag]                 viable/strict/1761713514    -> viable/strict/1761713514
2025-12-04T09:17:19.2164180Z  * [new tag]                 viable/strict/1761715523    -> viable/strict/1761715523
2025-12-04T09:17:19.2165863Z  * [new tag]                 viable/strict/1761727973    -> viable/strict/1761727973
2025-12-04T09:17:19.2167485Z  * [new tag]                 viable/strict/1761751558    -> viable/strict/1761751558
2025-12-04T09:17:19.2169149Z  * [new tag]                 viable/strict/1761755187    -> viable/strict/1761755187
2025-12-04T09:17:19.2170727Z  * [new tag]                 viable/strict/1761756826    -> viable/strict/1761756826
2025-12-04T09:17:19.2172343Z  * [new tag]                 viable/strict/1761769551    -> viable/strict/1761769551
2025-12-04T09:17:19.2173960Z  * [new tag]                 viable/strict/1761771032    -> viable/strict/1761771032
2025-12-04T09:17:19.2175489Z  * [new tag]                 viable/strict/1761773101    -> viable/strict/1761773101
2025-12-04T09:17:19.2177122Z  * [new tag]                 viable/strict/1761781792    -> viable/strict/1761781792
2025-12-04T09:17:19.2178889Z  * [new tag]                 viable/strict/1761784788    -> viable/strict/1761784788
2025-12-04T09:17:19.2180510Z  * [new tag]                 viable/strict/1761786740    -> viable/strict/1761786740
2025-12-04T09:17:19.2182116Z  * [new tag]                 viable/strict/1761789332    -> viable/strict/1761789332
2025-12-04T09:17:19.2184172Z  * [new tag]                 viable/strict/1761792569    -> viable/strict/1761792569
2025-12-04T09:17:19.2185754Z  * [new tag]                 viable/strict/1761795289    -> viable/strict/1761795289
2025-12-04T09:17:19.2187317Z  * [new tag]                 viable/strict/1761798345    -> viable/strict/1761798345
2025-12-04T09:17:19.2189006Z  * [new tag]                 viable/strict/1761799827    -> viable/strict/1761799827
2025-12-04T09:17:19.2191167Z  * [new tag]                 viable/strict/1761805604    -> viable/strict/1761805604
2025-12-04T09:17:19.2192726Z  * [new tag]                 viable/strict/1761807202    -> viable/strict/1761807202
2025-12-04T09:17:19.2194307Z  * [new tag]                 viable/strict/1761809094    -> viable/strict/1761809094
2025-12-04T09:17:19.2195883Z  * [new tag]                 viable/strict/1761810576    -> viable/strict/1761810576
2025-12-04T09:17:19.2197504Z  * [new tag]                 viable/strict/1761812771    -> viable/strict/1761812771
2025-12-04T09:17:19.2199090Z  * [new tag]                 viable/strict/1761814363    -> viable/strict/1761814363
2025-12-04T09:17:19.2200700Z  * [new tag]                 viable/strict/1761857410    -> viable/strict/1761857410
2025-12-04T09:17:19.2202323Z  * [new tag]                 viable/strict/1761860985    -> viable/strict/1761860985
2025-12-04T09:17:19.2203947Z  * [new tag]                 viable/strict/1761863094    -> viable/strict/1761863094
2025-12-04T09:17:19.2205499Z  * [new tag]                 viable/strict/1761864590    -> viable/strict/1761864590
2025-12-04T09:17:19.2207084Z  * [new tag]                 viable/strict/1761866675    -> viable/strict/1761866675
2025-12-04T09:17:19.2208892Z  * [new tag]                 viable/strict/1761868178    -> viable/strict/1761868178
2025-12-04T09:17:19.2213332Z  * [new tag]                 viable/strict/1761871111    -> viable/strict/1761871111
2025-12-04T09:17:19.2214931Z  * [new tag]                 viable/strict/1761873126    -> viable/strict/1761873126
2025-12-04T09:17:19.2216541Z  * [new tag]                 viable/strict/1761875714    -> viable/strict/1761875714
2025-12-04T09:17:19.2218147Z  * [new tag]                 viable/strict/1761878924    -> viable/strict/1761878924
2025-12-04T09:17:19.2219910Z  * [new tag]                 viable/strict/1761881727    -> viable/strict/1761881727
2025-12-04T09:17:19.2221502Z  * [new tag]                 viable/strict/1761882959    -> viable/strict/1761882959
2025-12-04T09:17:19.2223103Z  * [new tag]                 viable/strict/1761886268    -> viable/strict/1761886268
2025-12-04T09:17:19.2224676Z  * [new tag]                 viable/strict/1761893641    -> viable/strict/1761893641
2025-12-04T09:17:19.2226320Z  * [new tag]                 viable/strict/1761931517    -> viable/strict/1761931517
2025-12-04T09:17:19.2227883Z  * [new tag]                 viable/strict/1761933080    -> viable/strict/1761933080
2025-12-04T09:17:19.2229434Z  * [new tag]                 viable/strict/1761935217    -> viable/strict/1761935217
2025-12-04T09:17:19.2231141Z  * [new tag]                 viable/strict/1761938533    -> viable/strict/1761938533
2025-12-04T09:17:19.2232808Z  * [new tag]                 viable/strict/1761940184    -> viable/strict/1761940184
2025-12-04T09:17:19.2234398Z  * [new tag]                 viable/strict/1761942338    -> viable/strict/1761942338
2025-12-04T09:17:19.2236005Z  * [new tag]                 viable/strict/1761946100    -> viable/strict/1761946100
2025-12-04T09:17:19.2237557Z  * [new tag]                 viable/strict/1761947374    -> viable/strict/1761947374
2025-12-04T09:17:19.2239130Z  * [new tag]                 viable/strict/1761950978    -> viable/strict/1761950978
2025-12-04T09:17:19.2240850Z  * [new tag]                 viable/strict/1761957727    -> viable/strict/1761957727
2025-12-04T09:17:19.2242344Z  * [new tag]                 viable/strict/1761959532    -> viable/strict/1761959532
2025-12-04T09:17:19.2244030Z  * [new tag]                 viable/strict/1761965366    -> viable/strict/1761965366
2025-12-04T09:17:19.2245911Z  * [new tag]                 viable/strict/1761968066    -> viable/strict/1761968066
2025-12-04T09:17:19.2247471Z  * [new tag]                 viable/strict/1761969322    -> viable/strict/1761969322
2025-12-04T09:17:19.2249048Z  * [new tag]                 viable/strict/1761974723    -> viable/strict/1761974723
2025-12-04T09:17:19.2250735Z  * [new tag]                 viable/strict/1761981837    -> viable/strict/1761981837
2025-12-04T09:17:19.2252390Z  * [new tag]                 viable/strict/1761985546    -> viable/strict/1761985546
2025-12-04T09:17:19.2254003Z  * [new tag]                 viable/strict/1761987030    -> viable/strict/1761987030
2025-12-04T09:17:19.2255637Z  * [new tag]                 viable/strict/1762003554    -> viable/strict/1762003554
2025-12-04T09:17:19.2257218Z  * [new tag]                 viable/strict/1762021560    -> viable/strict/1762021560
2025-12-04T09:17:19.2258850Z  * [new tag]                 viable/strict/1762032190    -> viable/strict/1762032190
2025-12-04T09:17:19.2260635Z  * [new tag]                 viable/strict/1762040981    -> viable/strict/1762040981
2025-12-04T09:17:19.2262288Z  * [new tag]                 viable/strict/1762048525    -> viable/strict/1762048525
2025-12-04T09:17:19.2263905Z  * [new tag]                 viable/strict/1762104223    -> viable/strict/1762104223
2025-12-04T09:17:19.2265562Z  * [new tag]                 viable/strict/1762105778    -> viable/strict/1762105778
2025-12-04T09:17:19.2267076Z  * [new tag]                 viable/strict/1762115109    -> viable/strict/1762115109
2025-12-04T09:17:19.2268626Z  * [new tag]                 viable/strict/1762125840    -> viable/strict/1762125840
2025-12-04T09:17:19.2270124Z  * [new tag]                 viable/strict/1762127377    -> viable/strict/1762127377
2025-12-04T09:17:19.2272017Z  * [new tag]                 viable/strict/1762134925    -> viable/strict/1762134925
2025-12-04T09:17:19.2273516Z  * [new tag]                 viable/strict/1762138338    -> viable/strict/1762138338
2025-12-04T09:17:19.2275137Z  * [new tag]                 viable/strict/1762148993    -> viable/strict/1762148993
2025-12-04T09:17:19.2276787Z  * [new tag]                 viable/strict/1762152871    -> viable/strict/1762152871
2025-12-04T09:17:19.2278357Z  * [new tag]                 viable/strict/1762156183    -> viable/strict/1762156183
2025-12-04T09:17:19.2279914Z  * [new tag]                 viable/strict/1762163457    -> viable/strict/1762163457
2025-12-04T09:17:19.2281494Z  * [new tag]                 viable/strict/1762165569    -> viable/strict/1762165569
2025-12-04T09:17:19.2283061Z  * [new tag]                 viable/strict/1762169035    -> viable/strict/1762169035
2025-12-04T09:17:19.2284779Z  * [new tag]                 viable/strict/1762174936    -> viable/strict/1762174936
2025-12-04T09:17:19.2286332Z  * [new tag]                 viable/strict/1762194412    -> viable/strict/1762194412
2025-12-04T09:17:19.2287916Z  * [new tag]                 viable/strict/1762195876    -> viable/strict/1762195876
2025-12-04T09:17:19.2289527Z  * [new tag]                 viable/strict/1762197788    -> viable/strict/1762197788
2025-12-04T09:17:19.2291130Z  * [new tag]                 viable/strict/1762199389    -> viable/strict/1762199389
2025-12-04T09:17:19.2292943Z  * [new tag]                 viable/strict/1762206585    -> viable/strict/1762206585
2025-12-04T09:17:19.2294605Z  * [new tag]                 viable/strict/1762210184    -> viable/strict/1762210184
2025-12-04T09:17:19.2296193Z  * [new tag]                 viable/strict/1762218736    -> viable/strict/1762218736
2025-12-04T09:17:19.2298301Z  * [new tag]                 viable/strict/1762224529    -> viable/strict/1762224529
2025-12-04T09:17:19.2300177Z  * [new tag]                 viable/strict/1762227253    -> viable/strict/1762227253
2025-12-04T09:17:19.2301525Z  * [new tag]                 viable/strict/1762228515    -> viable/strict/1762228515
2025-12-04T09:17:19.2303217Z  * [new tag]                 viable/strict/1762230349    -> viable/strict/1762230349
2025-12-04T09:17:19.2304880Z  * [new tag]                 viable/strict/1762231859    -> viable/strict/1762231859
2025-12-04T09:17:19.2306466Z  * [new tag]                 viable/strict/1762233925    -> viable/strict/1762233925
2025-12-04T09:17:19.2308425Z  * [new tag]                 viable/strict/1762237630    -> viable/strict/1762237630
2025-12-04T09:17:19.2309832Z  * [new tag]                 viable/strict/1762253522    -> viable/strict/1762253522
2025-12-04T09:17:19.2311645Z  * [new tag]                 viable/strict/1762278588    -> viable/strict/1762278588
2025-12-04T09:17:19.2313263Z  * [new tag]                 viable/strict/1762284203    -> viable/strict/1762284203
2025-12-04T09:17:19.2314881Z  * [new tag]                 viable/strict/1762289446    -> viable/strict/1762289446
2025-12-04T09:17:19.2316520Z  * [new tag]                 viable/strict/1762291515    -> viable/strict/1762291515
2025-12-04T09:17:19.2318085Z  * [new tag]                 viable/strict/1762295100    -> viable/strict/1762295100
2025-12-04T09:17:19.2319663Z  * [new tag]                 viable/strict/1762296590    -> viable/strict/1762296590
2025-12-04T09:17:19.2321265Z  * [new tag]                 viable/strict/1762300179    -> viable/strict/1762300179
2025-12-04T09:17:19.2322747Z  * [new tag]                 viable/strict/1762303207    -> viable/strict/1762303207
2025-12-04T09:17:19.2324355Z  * [new tag]                 viable/strict/1762386584    -> viable/strict/1762386584
2025-12-04T09:17:19.2325968Z  * [new tag]                 viable/strict/1762391537    -> viable/strict/1762391537
2025-12-04T09:17:19.2327428Z  * [new tag]                 viable/strict/1762394119    -> viable/strict/1762394119
2025-12-04T09:17:19.2329342Z  * [new tag]                 viable/strict/1762397437    -> viable/strict/1762397437
2025-12-04T09:17:19.2330918Z  * [new tag]                 viable/strict/1762400256    -> viable/strict/1762400256
2025-12-04T09:17:19.2332492Z  * [new tag]                 viable/strict/1762401469    -> viable/strict/1762401469
2025-12-04T09:17:19.2334183Z  * [new tag]                 viable/strict/1762408195    -> viable/strict/1762408195
2025-12-04T09:17:19.2336033Z  * [new tag]                 viable/strict/1762410411    -> viable/strict/1762410411
2025-12-04T09:17:19.2337693Z  * [new tag]                 viable/strict/1762417613    -> viable/strict/1762417613
2025-12-04T09:17:19.2339416Z  * [new tag]                 viable/strict/1762419198    -> viable/strict/1762419198
2025-12-04T09:17:19.2341144Z  * [new tag]                 viable/strict/1762422656    -> viable/strict/1762422656
2025-12-04T09:17:19.2343145Z  * [new tag]                 viable/strict/1762424746    -> viable/strict/1762424746
2025-12-04T09:17:19.2344765Z  * [new tag]                 viable/strict/1762446386    -> viable/strict/1762446386
2025-12-04T09:17:19.2346397Z  * [new tag]                 viable/strict/1762449912    -> viable/strict/1762449912
2025-12-04T09:17:19.2347995Z  * [new tag]                 viable/strict/1762457031    -> viable/strict/1762457031
2025-12-04T09:17:19.2349748Z  * [new tag]                 viable/strict/1762462441    -> viable/strict/1762462441
2025-12-04T09:17:19.2351293Z  * [new tag]                 viable/strict/1762467909    -> viable/strict/1762467909
2025-12-04T09:17:19.2352932Z  * [new tag]                 viable/strict/1762471493    -> viable/strict/1762471493
2025-12-04T09:17:19.2354580Z  * [new tag]                 viable/strict/1762475990    -> viable/strict/1762475990
2025-12-04T09:17:19.2356334Z  * [new tag]                 viable/strict/1762477933    -> viable/strict/1762477933
2025-12-04T09:17:19.2357916Z  * [new tag]                 viable/strict/1762491053    -> viable/strict/1762491053
2025-12-04T09:17:19.2359662Z  * [new tag]                 viable/strict/1762493118    -> viable/strict/1762493118
2025-12-04T09:17:19.2361122Z  * [new tag]                 viable/strict/1762498442    -> viable/strict/1762498442
2025-12-04T09:17:19.2362761Z  * [new tag]                 viable/strict/1762501778    -> viable/strict/1762501778
2025-12-04T09:17:19.2364415Z  * [new tag]                 viable/strict/1762504001    -> viable/strict/1762504001
2025-12-04T09:17:19.2366051Z  * [new tag]                 viable/strict/1762505583    -> viable/strict/1762505583
2025-12-04T09:17:19.2367789Z  * [new tag]                 viable/strict/1762507523    -> viable/strict/1762507523
2025-12-04T09:17:19.2369427Z  * [new tag]                 viable/strict/1762511140    -> viable/strict/1762511140
2025-12-04T09:17:19.2372501Z  * [new tag]                 viable/strict/1762512632    -> viable/strict/1762512632
2025-12-04T09:17:19.2372843Z  * [new tag]                 viable/strict/1762520467    -> viable/strict/1762520467
2025-12-04T09:17:19.2374458Z  * [new tag]                 viable/strict/1762522016    -> viable/strict/1762522016
2025-12-04T09:17:19.2376021Z  * [new tag]                 viable/strict/1762530591    -> viable/strict/1762530591
2025-12-04T09:17:19.2377580Z  * [new tag]                 viable/strict/1762543405    -> viable/strict/1762543405
2025-12-04T09:17:19.2379070Z  * [new tag]                 viable/strict/1762544998    -> viable/strict/1762544998
2025-12-04T09:17:19.2380723Z  * [new tag]                 viable/strict/1762552182    -> viable/strict/1762552182
2025-12-04T09:17:19.2382353Z  * [new tag]                 viable/strict/1762554297    -> viable/strict/1762554297
2025-12-04T09:17:19.2383776Z  * [new tag]                 viable/strict/1762559381    -> viable/strict/1762559381
2025-12-04T09:17:19.2385380Z  * [new tag]                 viable/strict/1762562222    -> viable/strict/1762562222
2025-12-04T09:17:19.2386961Z  * [new tag]                 viable/strict/1762564319    -> viable/strict/1762564319
2025-12-04T09:17:19.2388480Z  * [new tag]                 viable/strict/1762566904    -> viable/strict/1762566904
2025-12-04T09:17:19.2390092Z  * [new tag]                 viable/strict/1762569781    -> viable/strict/1762569781
2025-12-04T09:17:19.2391666Z  * [new tag]                 viable/strict/1762575940    -> viable/strict/1762575940
2025-12-04T09:17:19.2393273Z  * [new tag]                 viable/strict/1762580974    -> viable/strict/1762580974
2025-12-04T09:17:19.2394954Z  * [new tag]                 viable/strict/1762583185    -> viable/strict/1762583185
2025-12-04T09:17:19.2396493Z  * [new tag]                 viable/strict/1762586647    -> viable/strict/1762586647
2025-12-04T09:17:19.2398072Z  * [new tag]                 viable/strict/1762588183    -> viable/strict/1762588183
2025-12-04T09:17:19.2399673Z  * [new tag]                 viable/strict/1762593886    -> viable/strict/1762593886
2025-12-04T09:17:19.2401466Z  * [new tag]                 viable/strict/1762650743    -> viable/strict/1762650743
2025-12-04T09:17:19.2403601Z  * [new tag]                 viable/strict/1762653328    -> viable/strict/1762653328
2025-12-04T09:17:19.2405234Z  * [new tag]                 viable/strict/1762659342    -> viable/strict/1762659342
2025-12-04T09:17:19.2406867Z  * [new tag]                 viable/strict/1762662360    -> viable/strict/1762662360
2025-12-04T09:17:19.2408766Z  * [new tag]                 viable/strict/1762667377    -> viable/strict/1762667377
2025-12-04T09:17:19.2410266Z  * [new tag]                 viable/strict/1762671090    -> viable/strict/1762671090
2025-12-04T09:17:19.2411870Z  * [new tag]                 viable/strict/1762680284    -> viable/strict/1762680284
2025-12-04T09:17:19.2413526Z  * [new tag]                 viable/strict/1762683900    -> viable/strict/1762683900
2025-12-04T09:17:19.2415132Z  * [new tag]                 viable/strict/1762705541    -> viable/strict/1762705541
2025-12-04T09:17:19.2416766Z  * [new tag]                 viable/strict/1762709004    -> viable/strict/1762709004
2025-12-04T09:17:19.2418558Z  * [new tag]                 viable/strict/1762746004    -> viable/strict/1762746004
2025-12-04T09:17:19.2420365Z  * [new tag]                 viable/strict/1762748799    -> viable/strict/1762748799
2025-12-04T09:17:19.2421960Z  * [new tag]                 viable/strict/1762759504    -> viable/strict/1762759504
2025-12-04T09:17:19.2423706Z  * [new tag]                 viable/strict/1762760973    -> viable/strict/1762760973
2025-12-04T09:17:19.2425290Z  * [new tag]                 viable/strict/1762775374    -> viable/strict/1762775374
2025-12-04T09:17:19.2426960Z  * [new tag]                 viable/strict/1762777661    -> viable/strict/1762777661
2025-12-04T09:17:19.2428542Z  * [new tag]                 viable/strict/1762779774    -> viable/strict/1762779774
2025-12-04T09:17:19.2430279Z  * [new tag]                 viable/strict/1762781259    -> viable/strict/1762781259
2025-12-04T09:17:19.2431910Z  * [new tag]                 viable/strict/1762793628    -> viable/strict/1762793628
2025-12-04T09:17:19.2433609Z  * [new tag]                 viable/strict/1762800711    -> viable/strict/1762800711
2025-12-04T09:17:19.2435217Z  * [new tag]                 viable/strict/1762809894    -> viable/strict/1762809894
2025-12-04T09:17:19.2436807Z  * [new tag]                 viable/strict/1762811384    -> viable/strict/1762811384
2025-12-04T09:17:19.2438480Z  * [new tag]                 viable/strict/1762813841    -> viable/strict/1762813841
2025-12-04T09:17:19.2440176Z  * [new tag]                 viable/strict/1762815047    -> viable/strict/1762815047
2025-12-04T09:17:19.2441908Z  * [new tag]                 viable/strict/1762817094    -> viable/strict/1762817094
2025-12-04T09:17:19.2443534Z  * [new tag]                 viable/strict/1762818582    -> viable/strict/1762818582
2025-12-04T09:17:19.2445130Z  * [new tag]                 viable/strict/1762821623    -> viable/strict/1762821623
2025-12-04T09:17:19.2446550Z  * [new tag]                 viable/strict/1762823531    -> viable/strict/1762823531
2025-12-04T09:17:19.2448269Z  * [new tag]                 viable/strict/1762849583    -> viable/strict/1762849583
2025-12-04T09:17:19.2449874Z  * [new tag]                 viable/strict/1762851200    -> viable/strict/1762851200
2025-12-04T09:17:19.2451471Z  * [new tag]                 viable/strict/1762854603    -> viable/strict/1762854603
2025-12-04T09:17:19.2453097Z  * [new tag]                 viable/strict/1762858276    -> viable/strict/1762858276
2025-12-04T09:17:19.2454852Z  * [new tag]                 viable/strict/1762860891    -> viable/strict/1762860891
2025-12-04T09:17:19.2457079Z  * [new tag]                 viable/strict/1762866174    -> viable/strict/1762866174
2025-12-04T09:17:19.2458660Z  * [new tag]                 viable/strict/1762867653    -> viable/strict/1762867653
2025-12-04T09:17:19.2460409Z  * [new tag]                 viable/strict/1762872669    -> viable/strict/1762872669
2025-12-04T09:17:19.2461872Z  * [new tag]                 viable/strict/1762878380    -> viable/strict/1762878380
2025-12-04T09:17:19.2463501Z  * [new tag]                 viable/strict/1762889003    -> viable/strict/1762889003
2025-12-04T09:17:19.2465101Z  * [new tag]                 viable/strict/1762890589    -> viable/strict/1762890589
2025-12-04T09:17:19.2466707Z  * [new tag]                 viable/strict/1762892743    -> viable/strict/1762892743
2025-12-04T09:17:19.2468345Z  * [new tag]                 viable/strict/1762894271    -> viable/strict/1762894271
2025-12-04T09:17:19.2469766Z  * [new tag]                 viable/strict/1762896287    -> viable/strict/1762896287
2025-12-04T09:17:19.2471406Z  * [new tag]                 viable/strict/1762915871    -> viable/strict/1762915871
2025-12-04T09:17:19.2473051Z  * [new tag]                 viable/strict/1762918569    -> viable/strict/1762918569
2025-12-04T09:17:19.2474502Z  * [new tag]                 viable/strict/1762919776    -> viable/strict/1762919776
2025-12-04T09:17:19.2476073Z  * [new tag]                 viable/strict/1762923072    -> viable/strict/1762923072
2025-12-04T09:17:19.2477890Z  * [new tag]                 viable/strict/1762928826    -> viable/strict/1762928826
2025-12-04T09:17:19.2479516Z  * [new tag]                 viable/strict/1762930451    -> viable/strict/1762930451
2025-12-04T09:17:19.2481070Z  * [new tag]                 viable/strict/1762933780    -> viable/strict/1762933780
2025-12-04T09:17:19.2482717Z  * [new tag]                 viable/strict/1762937638    -> viable/strict/1762937638
2025-12-04T09:17:19.2484446Z  * [new tag]                 viable/strict/1762939545    -> viable/strict/1762939545
2025-12-04T09:17:19.2486087Z  * [new tag]                 viable/strict/1762962692    -> viable/strict/1762962692
2025-12-04T09:17:19.2487644Z  * [new tag]                 viable/strict/1762979143    -> viable/strict/1762979143
2025-12-04T09:17:19.2489242Z  * [new tag]                 viable/strict/1762984188    -> viable/strict/1762984188
2025-12-04T09:17:19.2490710Z  * [new tag]                 viable/strict/1762986306    -> viable/strict/1762986306
2025-12-04T09:17:19.2492391Z  * [new tag]                 viable/strict/1762989903    -> viable/strict/1762989903
2025-12-04T09:17:19.2493977Z  * [new tag]                 viable/strict/1762991377    -> viable/strict/1762991377
2025-12-04T09:17:19.2495575Z  * [new tag]                 viable/strict/1762998921    -> viable/strict/1762998921
2025-12-04T09:17:19.2497260Z  * [new tag]                 viable/strict/1763002287    -> viable/strict/1763002287
2025-12-04T09:17:19.2498997Z  * [new tag]                 viable/strict/1763016840    -> viable/strict/1763016840
2025-12-04T09:17:19.2500676Z  * [new tag]                 viable/strict/1763020180    -> viable/strict/1763020180
2025-12-04T09:17:19.2502387Z  * [new tag]                 viable/strict/1763027421    -> viable/strict/1763027421
2025-12-04T09:17:19.2503979Z  * [new tag]                 viable/strict/1763031120    -> viable/strict/1763031120
2025-12-04T09:17:19.2505704Z  * [new tag]                 viable/strict/1763036861    -> viable/strict/1763036861
2025-12-04T09:17:19.2507404Z  * [new tag]                 viable/strict/1763038993    -> viable/strict/1763038993
2025-12-04T09:17:19.2509785Z  * [new tag]                 viable/strict/1763054703    -> viable/strict/1763054703
2025-12-04T09:17:19.2511253Z  * [new tag]                 viable/strict/1763067061    -> viable/strict/1763067061
2025-12-04T09:17:19.2512904Z  * [new tag]                 viable/strict/1763070847    -> viable/strict/1763070847
2025-12-04T09:17:19.2514526Z  * [new tag]                 viable/strict/1763072706    -> viable/strict/1763072706
2025-12-04T09:17:19.2516265Z  * [new tag]                 viable/strict/1763076302    -> viable/strict/1763076302
2025-12-04T09:17:19.2517967Z  * [new tag]                 viable/strict/1763080816    -> viable/strict/1763080816
2025-12-04T09:17:19.2519552Z  * [new tag]                 viable/strict/1763082732    -> viable/strict/1763082732
2025-12-04T09:17:19.2521130Z  * [new tag]                 viable/strict/1763085329    -> viable/strict/1763085329
2025-12-04T09:17:19.2522796Z  * [new tag]                 viable/strict/1763088623    -> viable/strict/1763088623
2025-12-04T09:17:19.2524539Z  * [new tag]                 viable/strict/1763091402    -> viable/strict/1763091402
2025-12-04T09:17:19.2526124Z  * [new tag]                 viable/strict/1763092602    -> viable/strict/1763092602
2025-12-04T09:17:19.2527715Z  * [new tag]                 viable/strict/1763094355    -> viable/strict/1763094355
2025-12-04T09:17:19.2529386Z  * [new tag]                 viable/strict/1763099390    -> viable/strict/1763099390
2025-12-04T09:17:19.2530983Z  * [new tag]                 viable/strict/1763101608    -> viable/strict/1763101608
2025-12-04T09:17:19.2532636Z  * [new tag]                 viable/strict/1763105102    -> viable/strict/1763105102
2025-12-04T09:17:19.2534358Z  * [new tag]                 viable/strict/1763112347    -> viable/strict/1763112347
2025-12-04T09:17:19.2536047Z  * [new tag]                 viable/strict/1763119471    -> viable/strict/1763119471
2025-12-04T09:17:19.2537702Z  * [new tag]                 viable/strict/1763126835    -> viable/strict/1763126835
2025-12-04T09:17:19.2539050Z  * [new tag]                 viable/strict/1763149779    -> viable/strict/1763149779
2025-12-04T09:17:19.2540808Z  * [new tag]                 viable/strict/1763164178    -> viable/strict/1763164178
2025-12-04T09:17:19.2542466Z  * [new tag]                 viable/strict/1763167104    -> viable/strict/1763167104
2025-12-04T09:17:19.2544002Z  * [new tag]                 viable/strict/1763169132    -> viable/strict/1763169132
2025-12-04T09:17:19.2545615Z  * [new tag]                 viable/strict/1763171708    -> viable/strict/1763171708
2025-12-04T09:17:19.2547207Z  * [new tag]                 viable/strict/1763174759    -> viable/strict/1763174759
2025-12-04T09:17:19.2548819Z  * [new tag]                 viable/strict/1763180744    -> viable/strict/1763180744
2025-12-04T09:17:19.2550431Z  * [new tag]                 viable/strict/1763182227    -> viable/strict/1763182227
2025-12-04T09:17:19.2552023Z  * [new tag]                 viable/strict/1763184309    -> viable/strict/1763184309
2025-12-04T09:17:19.2554109Z  * [new tag]                 viable/strict/1763187991    -> viable/strict/1763187991
2025-12-04T09:17:19.2555732Z  * [new tag]                 viable/strict/1763191445    -> viable/strict/1763191445
2025-12-04T09:17:19.2557593Z  * [new tag]                 viable/strict/1763195152    -> viable/strict/1763195152
2025-12-04T09:17:19.2559055Z  * [new tag]                 viable/strict/1763205769    -> viable/strict/1763205769
2025-12-04T09:17:19.2560790Z  * [new tag]                 viable/strict/1763246990    -> viable/strict/1763246990
2025-12-04T09:17:19.2562471Z  * [new tag]                 viable/strict/1763261578    -> viable/strict/1763261578
2025-12-04T09:17:19.2563994Z  * [new tag]                 viable/strict/1763286573    -> viable/strict/1763286573
2025-12-04T09:17:19.2565430Z  * [new tag]                 viable/strict/1763292167    -> viable/strict/1763292167
2025-12-04T09:17:19.2567082Z  * [new tag]                 viable/strict/1763333386    -> viable/strict/1763333386
2025-12-04T09:17:19.2568694Z  * [new tag]                 viable/strict/1763340082    -> viable/strict/1763340082
2025-12-04T09:17:19.2570971Z  * [new tag]                 viable/strict/1763364324    -> viable/strict/1763364324
2025-12-04T09:17:19.2572637Z  * [new tag]                 viable/strict/1763371569    -> viable/strict/1763371569
2025-12-04T09:17:19.2574222Z  * [new tag]                 viable/strict/1763373067    -> viable/strict/1763373067
2025-12-04T09:17:19.2575825Z  * [new tag]                 viable/strict/1763375157    -> viable/strict/1763375157
2025-12-04T09:17:19.2577447Z  * [new tag]                 viable/strict/1763382462    -> viable/strict/1763382462
2025-12-04T09:17:19.2579221Z  * [new tag]                 viable/strict/1763394661    -> viable/strict/1763394661
2025-12-04T09:17:19.2581007Z  * [new tag]                 viable/strict/1763396797    -> viable/strict/1763396797
2025-12-04T09:17:19.2582701Z  * [new tag]                 viable/strict/1763398542    -> viable/strict/1763398542
2025-12-04T09:17:19.2584371Z  * [new tag]                 viable/strict/1763401807    -> viable/strict/1763401807
2025-12-04T09:17:19.2585815Z  * [new tag]                 viable/strict/1763414698    -> viable/strict/1763414698
2025-12-04T09:17:19.2587653Z  * [new tag]                 viable/strict/1763419807    -> viable/strict/1763419807
2025-12-04T09:17:19.2589331Z  * [new tag]                 viable/strict/1763426369    -> viable/strict/1763426369
2025-12-04T09:17:19.2591021Z  * [new tag]                 viable/strict/1763428331    -> viable/strict/1763428331
2025-12-04T09:17:19.2592693Z  * [new tag]                 viable/strict/1763430922    -> viable/strict/1763430922
2025-12-04T09:17:19.2594151Z  * [new tag]                 viable/strict/1763434184    -> viable/strict/1763434184
2025-12-04T09:17:19.2595776Z  * [new tag]                 viable/strict/1763439973    -> viable/strict/1763439973
2025-12-04T09:17:19.2597545Z  * [new tag]                 viable/strict/1763444995    -> viable/strict/1763444995
2025-12-04T09:17:19.2598993Z  * [new tag]                 viable/strict/1763447206    -> viable/strict/1763447206
2025-12-04T09:17:19.2600641Z  * [new tag]                 viable/strict/1763448826    -> viable/strict/1763448826
2025-12-04T09:17:19.2602283Z  * [new tag]                 viable/strict/1763450717    -> viable/strict/1763450717
2025-12-04T09:17:19.2603981Z  * [new tag]                 viable/strict/1763452183    -> viable/strict/1763452183
2025-12-04T09:17:19.2605625Z  * [new tag]                 viable/strict/1763457945    -> viable/strict/1763457945
2025-12-04T09:17:19.2607257Z  * [new tag]                 viable/strict/1763459439    -> viable/strict/1763459439
2025-12-04T09:17:19.2608817Z  * [new tag]                 viable/strict/1763461556    -> viable/strict/1763461556
2025-12-04T09:17:19.2613080Z  * [new tag]                 viable/strict/1763463103    -> viable/strict/1763463103
2025-12-04T09:17:19.2614738Z  * [new tag]                 viable/strict/1763465100    -> viable/strict/1763465100
2025-12-04T09:17:19.2616292Z  * [new tag]                 viable/strict/1763468866    -> viable/strict/1763468866
2025-12-04T09:17:19.2618190Z  * [new tag]                 viable/strict/1763493823    -> viable/strict/1763493823
2025-12-04T09:17:19.2619794Z  * [new tag]                 viable/strict/1763496249    -> viable/strict/1763496249
2025-12-04T09:17:19.2621408Z  * [new tag]                 viable/strict/1763502620    -> viable/strict/1763502620
2025-12-04T09:17:19.2623052Z  * [new tag]                 viable/strict/1763504715    -> viable/strict/1763504715
2025-12-04T09:17:19.2624655Z  * [new tag]                 viable/strict/1763506208    -> viable/strict/1763506208
2025-12-04T09:17:19.2626384Z  * [new tag]                 viable/strict/1763520590    -> viable/strict/1763520590
2025-12-04T09:17:19.2627963Z  * [new tag]                 viable/strict/1763523357    -> viable/strict/1763523357
2025-12-04T09:17:19.2629628Z  * [new tag]                 viable/strict/1763529922    -> viable/strict/1763529922
2025-12-04T09:17:19.2631336Z  * [new tag]                 viable/strict/1763531408    -> viable/strict/1763531408
2025-12-04T09:17:19.2632917Z  * [new tag]                 viable/strict/1763533622    -> viable/strict/1763533622
2025-12-04T09:17:19.2634542Z  * [new tag]                 viable/strict/1763538576    -> viable/strict/1763538576
2025-12-04T09:17:19.2636207Z  * [new tag]                 viable/strict/1763545823    -> viable/strict/1763545823
2025-12-04T09:17:19.2637685Z  * [new tag]                 viable/strict/1763547951    -> viable/strict/1763547951
2025-12-04T09:17:19.2639385Z  * [new tag]                 viable/strict/1763551477    -> viable/strict/1763551477
2025-12-04T09:17:19.2640952Z  * [new tag]                 viable/strict/1763552982    -> viable/strict/1763552982
2025-12-04T09:17:19.2642582Z  * [new tag]                 viable/strict/1763594698    -> viable/strict/1763594698
2025-12-04T09:17:19.2644184Z  * [new tag]                 viable/strict/1763596178    -> viable/strict/1763596178
2025-12-04T09:17:19.2645798Z  * [new tag]                 viable/strict/1763599155    -> viable/strict/1763599155
2025-12-04T09:17:19.2647365Z  * [new tag]                 viable/strict/1763603717    -> viable/strict/1763603717
2025-12-04T09:17:19.2649021Z  * [new tag]                 viable/strict/1763606923    -> viable/strict/1763606923
2025-12-04T09:17:19.2650639Z  * [new tag]                 viable/strict/1763609715    -> viable/strict/1763609715
2025-12-04T09:17:19.2652243Z  * [new tag]                 viable/strict/1763612757    -> viable/strict/1763612757
2025-12-04T09:17:19.2653816Z  * [new tag]                 viable/strict/1763616325    -> viable/strict/1763616325
2025-12-04T09:17:19.2655427Z  * [new tag]                 viable/strict/1763623509    -> viable/strict/1763623509
2025-12-04T09:17:19.2657141Z  * [new tag]                 viable/strict/1763624984    -> viable/strict/1763624984
2025-12-04T09:17:19.2658900Z  * [new tag]                 viable/strict/1763628796    -> viable/strict/1763628796
2025-12-04T09:17:19.2660454Z  * [new tag]                 viable/strict/1763634343    -> viable/strict/1763634343
2025-12-04T09:17:19.2662015Z  * [new tag]                 viable/strict/1763635867    -> viable/strict/1763635867
2025-12-04T09:17:19.2663806Z  * [new tag]                 viable/strict/1763639382    -> viable/strict/1763639382
2025-12-04T09:17:19.2665348Z  * [new tag]                 viable/strict/1763646626    -> viable/strict/1763646626
2025-12-04T09:17:19.2667174Z  * [new tag]                 viable/strict/1763655997    -> viable/strict/1763655997
2025-12-04T09:17:19.2668892Z  * [new tag]                 viable/strict/1763659444    -> viable/strict/1763659444
2025-12-04T09:17:19.2670437Z  * [new tag]                 viable/strict/1763660992    -> viable/strict/1763660992
2025-12-04T09:17:19.2671983Z  * [new tag]                 viable/strict/1763663201    -> viable/strict/1763663201
2025-12-04T09:17:19.2673667Z  * [new tag]                 viable/strict/1763670362    -> viable/strict/1763670362
2025-12-04T09:17:19.2675051Z  * [new tag]                 viable/strict/1763675378    -> viable/strict/1763675378
2025-12-04T09:17:19.2676694Z  * [new tag]                 viable/strict/1763693343    -> viable/strict/1763693343
2025-12-04T09:17:19.2678259Z  * [new tag]                 viable/strict/1763696088    -> viable/strict/1763696088
2025-12-04T09:17:19.2679960Z  * [new tag]                 viable/strict/1763697343    -> viable/strict/1763697343
2025-12-04T09:17:19.2681582Z  * [new tag]                 viable/strict/1763699165    -> viable/strict/1763699165
2025-12-04T09:17:19.2683179Z  * [new tag]                 viable/strict/1763700660    -> viable/strict/1763700660
2025-12-04T09:17:19.2684732Z  * [new tag]                 viable/strict/1763704209    -> viable/strict/1763704209
2025-12-04T09:17:19.2686349Z  * [new tag]                 viable/strict/1763706411    -> viable/strict/1763706411
2025-12-04T09:17:19.2687928Z  * [new tag]                 viable/strict/1763708082    -> viable/strict/1763708082
2025-12-04T09:17:19.2689441Z  * [new tag]                 viable/strict/1763711381    -> viable/strict/1763711381
2025-12-04T09:17:19.2690957Z  * [new tag]                 viable/strict/1763713593    -> viable/strict/1763713593
2025-12-04T09:17:19.2692815Z  * [new tag]                 viable/strict/1763715201    -> viable/strict/1763715201
2025-12-04T09:17:19.2694436Z  * [new tag]                 viable/strict/1763733017    -> viable/strict/1763733017
2025-12-04T09:17:19.2696090Z  * [new tag]                 viable/strict/1763735108    -> viable/strict/1763735108
2025-12-04T09:17:19.2697668Z  * [new tag]                 viable/strict/1763749579    -> viable/strict/1763749579
2025-12-04T09:17:19.2699316Z  * [new tag]                 viable/strict/1763751113    -> viable/strict/1763751113
2025-12-04T09:17:19.2701013Z  * [new tag]                 viable/strict/1763753035    -> viable/strict/1763753035
2025-12-04T09:17:19.2702732Z  * [new tag]                 viable/strict/1763754578    -> viable/strict/1763754578
2025-12-04T09:17:19.2704279Z  * [new tag]                 viable/strict/1763756748    -> viable/strict/1763756748
2025-12-04T09:17:19.2705835Z  * [new tag]                 viable/strict/1763758205    -> viable/strict/1763758205
2025-12-04T09:17:19.2707285Z  * [new tag]                 viable/strict/1763764050    -> viable/strict/1763764050
2025-12-04T09:17:19.2708920Z  * [new tag]                 viable/strict/1763771887    -> viable/strict/1763771887
2025-12-04T09:17:19.2710786Z  * [new tag]                 viable/strict/1763773920    -> viable/strict/1763773920
2025-12-04T09:17:19.2712334Z  * [new tag]                 viable/strict/1763776501    -> viable/strict/1763776501
2025-12-04T09:17:19.2713901Z  * [new tag]                 viable/strict/1763779437    -> viable/strict/1763779437
2025-12-04T09:17:19.2715751Z  * [new tag]                 viable/strict/1763781038    -> viable/strict/1763781038
2025-12-04T09:17:19.2717365Z  * [new tag]                 viable/strict/1763782245    -> viable/strict/1763782245
2025-12-04T09:17:19.2718820Z  * [new tag]                 viable/strict/1763785568    -> viable/strict/1763785568
2025-12-04T09:17:19.2720564Z  * [new tag]                 viable/strict/1763787006    -> viable/strict/1763787006
2025-12-04T09:17:19.2722739Z  * [new tag]                 viable/strict/1763789103    -> viable/strict/1763789103
2025-12-04T09:17:19.2724242Z  * [new tag]                 viable/strict/1763790578    -> viable/strict/1763790578
2025-12-04T09:17:19.2725842Z  * [new tag]                 viable/strict/1763796275    -> viable/strict/1763796275
2025-12-04T09:17:19.2727690Z  * [new tag]                 viable/strict/1763801465    -> viable/strict/1763801465
2025-12-04T09:17:19.2729338Z  * [new tag]                 viable/strict/1763803522    -> viable/strict/1763803522
2025-12-04T09:17:19.2730861Z  * [new tag]                 viable/strict/1763808581    -> viable/strict/1763808581
2025-12-04T09:17:19.2732453Z  * [new tag]                 viable/strict/1763840977    -> viable/strict/1763840977
2025-12-04T09:17:19.2734041Z  * [new tag]                 viable/strict/1763846659    -> viable/strict/1763846659
2025-12-04T09:17:19.2735670Z  * [new tag]                 viable/strict/1763872065    -> viable/strict/1763872065
2025-12-04T09:17:19.2737297Z  * [new tag]                 viable/strict/1763873648    -> viable/strict/1763873648
2025-12-04T09:17:19.2738896Z  * [new tag]                 viable/strict/1763875506    -> viable/strict/1763875506
2025-12-04T09:17:19.2740502Z  * [new tag]                 viable/strict/1763889904    -> viable/strict/1763889904
2025-12-04T09:17:19.2742112Z  * [new tag]                 viable/strict/1763930999    -> viable/strict/1763930999
2025-12-04T09:17:19.2743761Z  * [new tag]                 viable/strict/1763944964    -> viable/strict/1763944964
2025-12-04T09:17:19.2745217Z  * [new tag]                 viable/strict/1763958474    -> viable/strict/1763958474
2025-12-04T09:17:19.2746836Z  * [new tag]                 viable/strict/1763967263    -> viable/strict/1763967263
2025-12-04T09:17:19.2748488Z  * [new tag]                 viable/strict/1763972803    -> viable/strict/1763972803
2025-12-04T09:17:19.2750052Z  * [new tag]                 viable/strict/1763976376    -> viable/strict/1763976376
2025-12-04T09:17:19.2751641Z  * [new tag]                 viable/strict/1763989404    -> viable/strict/1763989404
2025-12-04T09:17:19.2753225Z  * [new tag]                 viable/strict/1763990887    -> viable/strict/1763990887
2025-12-04T09:17:19.2754817Z  * [new tag]                 viable/strict/1764019919    -> viable/strict/1764019919
2025-12-04T09:17:19.2756534Z  * [new tag]                 viable/strict/1764023134    -> viable/strict/1764023134
2025-12-04T09:17:19.2757958Z  * [new tag]                 viable/strict/1764024593    -> viable/strict/1764024593
2025-12-04T09:17:19.2759565Z  * [new tag]                 viable/strict/1764026706    -> viable/strict/1764026706
2025-12-04T09:17:19.2761415Z  * [new tag]                 viable/strict/1764031139    -> viable/strict/1764031139
2025-12-04T09:17:19.2763027Z  * [new tag]                 viable/strict/1764033131    -> viable/strict/1764033131
2025-12-04T09:17:19.2764460Z  * [new tag]                 viable/strict/1764035725    -> viable/strict/1764035725
2025-12-04T09:17:19.2765913Z  * [new tag]                 viable/strict/1764624265    -> viable/strict/1764624265
2025-12-04T09:17:19.2767346Z  * [new tag]                 viable/strict/1764631514    -> viable/strict/1764631514
2025-12-04T09:17:19.2768772Z  * [new tag]                 viable/strict/1764632987    -> viable/strict/1764632987
2025-12-04T09:17:19.2770195Z  * [new tag]                 viable/strict/1764636063    -> viable/strict/1764636063
2025-12-04T09:17:19.2771766Z  * [new tag]                 viable/strict/1764643975    -> viable/strict/1764643975
2025-12-04T09:17:19.2773190Z  * [new tag]                 viable/strict/1764646859    -> viable/strict/1764646859
2025-12-04T09:17:19.2774724Z  * [new tag]                 viable/strict/1764653120    -> viable/strict/1764653120
2025-12-04T09:17:19.2776038Z  * [new tag]                 viable/strict/1764654632    -> viable/strict/1764654632
2025-12-04T09:17:19.2777462Z  * [new tag]                 viable/strict/1764656821    -> viable/strict/1764656821
2025-12-04T09:17:19.2778910Z  * [new tag]                 viable/strict/1764658557    -> viable/strict/1764658557
2025-12-04T09:17:19.2780410Z  * [new tag]                 viable/strict/1764660333    -> viable/strict/1764660333
2025-12-04T09:17:19.2781836Z  * [new tag]                 viable/strict/1764661812    -> viable/strict/1764661812
2025-12-04T09:17:19.2783286Z  * [new tag]                 viable/strict/1764664023    -> viable/strict/1764664023
2025-12-04T09:17:19.2784676Z  * [new tag]                 viable/strict/1764669150    -> viable/strict/1764669150
2025-12-04T09:17:19.2786193Z  * [new tag]                 viable/strict/1764680709    -> viable/strict/1764680709
2025-12-04T09:17:19.2787597Z  * [new tag]                 viable/strict/1764687619    -> viable/strict/1764687619
2025-12-04T09:17:19.2789087Z  * [new tag]                 viable/strict/1764696355    -> viable/strict/1764696355
2025-12-04T09:17:19.2790474Z  * [new tag]                 viable/strict/1764701767    -> viable/strict/1764701767
2025-12-04T09:17:19.2791903Z  * [new tag]                 viable/strict/1764710768    -> viable/strict/1764710768
2025-12-04T09:17:19.2793535Z  * [new tag]                 viable/strict/1764716202    -> viable/strict/1764716202
2025-12-04T09:17:19.2795043Z  * [new tag]                 viable/strict/1764793566    -> viable/strict/1764793566
2025-12-04T09:17:19.2796464Z  * [new tag]                 viable/strict/1764797093    -> viable/strict/1764797093
2025-12-04T09:17:19.2797906Z  * [new tag]                 viable/strict/1764800729    -> viable/strict/1764800729
2025-12-04T09:17:19.2799436Z  * [new tag]                 whc_flight_1                -> whc_flight_1
2025-12-04T09:17:19.2800952Z  * [new tag]                 whc_flight_2                -> whc_flight_2
2025-12-04T09:17:19.2802649Z  * [new tag]                 whc_flight_4                -> whc_flight_4
2025-12-04T09:17:19.3987953Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T09:17:19.4020888Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:17:19.4026272Z ##[endgroup]
2025-12-04T09:17:19.4028164Z ##[group]Determining the checkout info
2025-12-04T09:17:19.4028604Z ##[endgroup]
2025-12-04T09:17:19.4032370Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T09:17:19.4076223Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T09:17:19.4113910Z ##[group]Checking out the ref
2025-12-04T09:17:19.4117060Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:17:20.4421649Z Updating files:  65% (13152/20121)
2025-12-04T09:17:20.4504366Z Updating files:  66% (13280/20121)
2025-12-04T09:17:20.4588111Z Updating files:  67% (13482/20121)
2025-12-04T09:17:20.4670370Z Updating files:  68% (13683/20121)
2025-12-04T09:17:20.4885645Z Updating files:  69% (13884/20121)
2025-12-04T09:17:20.5214487Z Updating files:  70% (14085/20121)
2025-12-04T09:17:20.5284690Z Updating files:  71% (14286/20121)
2025-12-04T09:17:20.5378381Z Updating files:  72% (14488/20121)
2025-12-04T09:17:20.5599628Z Updating files:  73% (14689/20121)
2025-12-04T09:17:20.5877753Z Updating files:  74% (14890/20121)
2025-12-04T09:17:20.6426132Z Updating files:  75% (15091/20121)
2025-12-04T09:17:20.6606363Z Updating files:  76% (15292/20121)
2025-12-04T09:17:20.6773579Z Updating files:  77% (15494/20121)
2025-12-04T09:17:20.7015687Z Updating files:  78% (15695/20121)
2025-12-04T09:17:20.7313800Z Updating files:  79% (15896/20121)
2025-12-04T09:17:20.7672526Z Updating files:  80% (16097/20121)
2025-12-04T09:17:20.7997629Z Updating files:  81% (16299/20121)
2025-12-04T09:17:20.8257952Z Updating files:  82% (16500/20121)
2025-12-04T09:17:20.8449324Z Updating files:  83% (16701/20121)
2025-12-04T09:17:20.8627408Z Updating files:  84% (16902/20121)
2025-12-04T09:17:20.8827040Z Updating files:  85% (17103/20121)
2025-12-04T09:17:20.9021215Z Updating files:  86% (17305/20121)
2025-12-04T09:17:20.9202423Z Updating files:  87% (17506/20121)
2025-12-04T09:17:20.9352950Z Updating files:  88% (17707/20121)
2025-12-04T09:17:20.9528662Z Updating files:  89% (17908/20121)
2025-12-04T09:17:20.9738791Z Updating files:  90% (18109/20121)
2025-12-04T09:17:20.9890545Z Updating files:  91% (18311/20121)
2025-12-04T09:17:21.0085587Z Updating files:  92% (18512/20121)
2025-12-04T09:17:21.0315837Z Updating files:  93% (18713/20121)
2025-12-04T09:17:21.0559317Z Updating files:  94% (18914/20121)
2025-12-04T09:17:21.0771075Z Updating files:  95% (19115/20121)
2025-12-04T09:17:21.0969249Z Updating files:  96% (19317/20121)
2025-12-04T09:17:21.1171771Z Updating files:  97% (19518/20121)
2025-12-04T09:17:21.1497208Z Updating files:  98% (19719/20121)
2025-12-04T09:17:21.1711534Z Updating files:  99% (19920/20121)
2025-12-04T09:17:21.1712322Z Updating files: 100% (20121/20121)
2025-12-04T09:17:21.1712889Z Updating files: 100% (20121/20121), done.
2025-12-04T09:17:21.1997896Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'.
2025-12-04T09:17:21.1998339Z 
2025-12-04T09:17:21.1998636Z You are in 'detached HEAD' state. You can look around, make experimental
2025-12-04T09:17:21.1999326Z changes and commit them, and you can discard any commits you make in this
2025-12-04T09:17:21.1999866Z state without impacting any branches by switching back to a branch.
2025-12-04T09:17:21.2000195Z 
2025-12-04T09:17:21.2000396Z If you want to create a new branch to retain commits you create, you may
2025-12-04T09:17:21.2000896Z do so (now or later) by using -c with the switch command. Example:
2025-12-04T09:17:21.2001181Z 
2025-12-04T09:17:21.2001298Z   git switch -c <new-branch-name>
2025-12-04T09:17:21.2001492Z 
2025-12-04T09:17:21.2001598Z Or undo this operation with:
2025-12-04T09:17:21.2001786Z 
2025-12-04T09:17:21.2001870Z   git switch -
2025-12-04T09:17:21.2001995Z 
2025-12-04T09:17:21.2002232Z Turn off this advice by setting config variable advice.detachedHead to false
2025-12-04T09:17:21.2002578Z 
2025-12-04T09:17:21.2003583Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T09:17:21.2186787Z ##[endgroup]
2025-12-04T09:17:21.2187305Z ##[group]Setting up auth for fetching submodules
2025-12-04T09:17:21.2195703Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T09:17:21.2255054Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T09:17:21.2291022Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T09:17:21.2325202Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T09:17:21.2359179Z ##[endgroup]
2025-12-04T09:17:21.2359564Z ##[group]Fetching submodules
2025-12-04T09:17:21.2362030Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T09:17:21.2775899Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T09:17:21.3178690Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni'
2025-12-04T09:17:21.3181120Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
2025-12-04T09:17:21.3184836Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
2025-12-04T09:17:21.3188543Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
2025-12-04T09:17:21.3192340Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX'
2025-12-04T09:17:21.3197032Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator'
2025-12-04T09:17:21.3200474Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK'
2025-12-04T09:17:21.3204696Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter'
2025-12-04T09:17:21.3209605Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
2025-12-04T09:17:21.3214049Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel'
2025-12-04T09:17:21.3218332Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib'
2025-12-04T09:17:21.3223549Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo'
2025-12-04T09:17:21.3228025Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend'
2025-12-04T09:17:21.3232498Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass'
2025-12-04T09:17:21.3237130Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm'
2025-12-04T09:17:21.3241993Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention'
2025-12-04T09:17:21.3250134Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers'
2025-12-04T09:17:21.3254940Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt'
2025-12-04T09:17:21.3260081Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:17:21.3264945Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo'
2025-12-04T09:17:21.3270112Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
2025-12-04T09:17:21.3275242Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep'
2025-12-04T09:17:21.3283588Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi'
2025-12-04T09:17:21.3286991Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto'
2025-12-04T09:17:21.3292565Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai'
2025-12-04T09:17:21.3298365Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc'
2025-12-04T09:17:21.3304871Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann'
2025-12-04T09:17:21.3311246Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
2025-12-04T09:17:21.3317542Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp'
2025-12-04T09:17:21.3323341Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft'
2025-12-04T09:17:21.3329499Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf'
2025-12-04T09:17:21.3335762Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
2025-12-04T09:17:21.3343253Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
2025-12-04T09:17:21.3352574Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
2025-12-04T09:17:21.3358806Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy'
2025-12-04T09:17:21.3365312Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef'
2025-12-04T09:17:21.3372073Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe'
2025-12-04T09:17:21.3413445Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'...
2025-12-04T09:17:21.5842894Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'...
2025-12-04T09:17:21.5843665Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'...
2025-12-04T09:17:21.5844353Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'...
2025-12-04T09:17:21.5877493Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'...
2025-12-04T09:17:24.5513905Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'...
2025-12-04T09:17:24.5515287Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'...
2025-12-04T09:17:24.5516725Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'...
2025-12-04T09:17:24.5517859Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'...
2025-12-04T09:17:24.5519125Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'...
2025-12-04T09:17:24.5520436Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'...
2025-12-04T09:17:24.5521852Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'...
2025-12-04T09:17:24.5523150Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'...
2025-12-04T09:17:24.5524295Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'...
2025-12-04T09:17:24.5525589Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'...
2025-12-04T09:17:24.5527403Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'...
2025-12-04T09:17:24.5528619Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'...
2025-12-04T09:17:24.5529867Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'...
2025-12-04T09:17:24.5530927Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'...
2025-12-04T09:17:24.5532274Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'...
2025-12-04T09:17:24.5533601Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'...
2025-12-04T09:17:24.5746177Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
2025-12-04T09:17:24.7118951Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'...
2025-12-04T09:17:24.8135298Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
2025-12-04T09:17:24.9009903Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
2025-12-04T09:17:24.9937608Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
2025-12-04T09:17:27.4113030Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
2025-12-04T09:17:27.4114446Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
2025-12-04T09:17:27.4116001Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
2025-12-04T09:17:27.4117804Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
2025-12-04T09:17:27.4120624Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
2025-12-04T09:17:27.5114343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
2025-12-04T09:17:45.6006170Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'...
2025-12-04T09:17:45.6008722Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
2025-12-04T09:17:45.6010788Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'...
2025-12-04T09:17:45.6012756Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'...
2025-12-04T09:17:45.6013740Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
2025-12-04T09:17:45.6226935Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T09:17:45.6400220Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T09:17:45.6540943Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T09:17:45.6894536Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T09:17:45.7964347Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T09:17:45.8628159Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T09:17:46.8699754Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T09:17:47.0969844Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T09:17:47.0998298Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:17:47.1033984Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'...
2025-12-04T09:17:52.4427811Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T09:17:52.4754401Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T09:17:52.9519214Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T09:17:53.0153184Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T09:17:53.1341373Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T09:17:53.1957918Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T09:17:54.0361602Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T09:17:54.2388768Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T09:17:54.2419812Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit'
2025-12-04T09:17:54.2422821Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:17:54.2426631Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:17:54.2430830Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass'
2025-12-04T09:17:54.2435239Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest'
2025-12-04T09:17:54.2439450Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:17:54.2443253Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json'
2025-12-04T09:17:54.2478589Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'...
2025-12-04T09:17:55.4861405Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'...
2025-12-04T09:17:55.4862413Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'...
2025-12-04T09:17:55.4863698Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'...
2025-12-04T09:17:55.5863177Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'...
2025-12-04T09:17:59.2034016Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'...
2025-12-04T09:17:59.3035466Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'...
2025-12-04T09:18:02.2206475Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T09:18:02.6946540Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T09:18:02.8163612Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T09:18:03.6352233Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T09:18:03.6917572Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:18:03.7079780Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T09:18:03.8426534Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T09:18:03.9381766Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T09:18:03.9408672Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:18:03.9411655Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:18:03.9448609Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'...
2025-12-04T09:18:08.7141094Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'...
2025-12-04T09:18:09.0518540Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T09:18:09.7785233Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T09:18:09.9663648Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T09:18:10.0042793Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T09:18:10.0528400Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T09:18:10.0884465Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T09:18:10.1445469Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:18:10.1626547Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T09:18:10.1651294Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn'
2025-12-04T09:18:10.1685603Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'...
2025-12-04T09:18:28.2784813Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T09:18:28.3063562Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T09:18:28.4097182Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T09:18:28.4124998Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:18:28.4128309Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:18:28.4132325Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:18:28.4168690Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'...
2025-12-04T09:18:29.1555059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'...
2025-12-04T09:18:29.7559748Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'...
2025-12-04T09:18:29.8660365Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T09:18:29.8684313Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:18:29.8688332Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:18:29.8692375Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:18:29.8696581Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:18:29.8700879Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:18:29.8705354Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:18:29.8710179Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:18:29.8714527Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:18:29.8719236Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:18:29.8753826Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'...
2025-12-04T09:18:31.8046819Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'...
2025-12-04T09:18:31.8048232Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'...
2025-12-04T09:18:31.8049840Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'...
2025-12-04T09:18:31.8051123Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'...
2025-12-04T09:18:31.8052633Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'...
2025-12-04T09:18:31.8053989Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'...
2025-12-04T09:18:31.8055319Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'...
2025-12-04T09:18:31.9047801Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'...
2025-12-04T09:18:37.8830066Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T09:18:37.9085425Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T09:18:37.9547686Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T09:18:37.9739826Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T09:18:37.9762873Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:18:37.9797596Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'...
2025-12-04T09:18:38.2686040Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T09:18:38.2938505Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T09:18:38.3504032Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:18:38.4795156Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T09:18:38.5021197Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T09:18:38.5265343Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T09:18:38.5288857Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:18:38.5292104Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:18:38.5328313Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T09:18:40.7925230Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T09:18:41.0864212Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T09:18:41.1443246Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T09:18:41.1855486Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T09:18:41.2423729Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:18:41.3144499Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T09:18:41.3658298Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T09:18:41.5013887Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T09:18:42.1317714Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T09:18:42.1361614Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
2025-12-04T09:18:42.1397339Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'...
2025-12-04T09:18:43.0646174Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T09:18:43.1644751Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T09:18:43.1673537Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:18:43.1676465Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:18:43.1680291Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:18:43.1684440Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:18:43.1688659Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:18:43.1692726Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:18:43.1696957Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:18:43.1701218Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:18:43.1736643Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'...
2025-12-04T09:18:43.6221425Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'...
2025-12-04T09:18:43.6223141Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'...
2025-12-04T09:18:43.6224752Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'...
2025-12-04T09:18:43.6226236Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'...
2025-12-04T09:18:43.7223052Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'...
2025-12-04T09:18:44.4338067Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'...
2025-12-04T09:18:51.5171887Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'...
2025-12-04T09:18:52.2572918Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T09:18:52.3082836Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T09:18:52.3311881Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T09:18:52.4663713Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T09:18:52.4852212Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T09:18:52.5064776Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T09:18:52.5293726Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T09:18:52.5319217Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:18:52.5321821Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:18:52.5355739Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T09:18:54.7770210Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T09:18:55.0695181Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T09:18:55.1270687Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T09:18:55.8581130Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T09:18:55.8746388Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T09:18:56.2144877Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T09:18:56.2175579Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:18:56.2178365Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest'
2025-12-04T09:18:56.2213100Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'...
2025-12-04T09:18:56.7713700Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'...
2025-12-04T09:18:57.2251570Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T09:18:57.3126600Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T09:18:57.3264142Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T09:18:57.3435073Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T09:18:57.3997459Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T09:18:57.4370135Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T09:18:57.4916511Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T09:18:57.5308091Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T09:18:57.5336957Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:18:57.5338389Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:18:57.5341242Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:18:57.5345174Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:18:57.5381688Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'...
2025-12-04T09:18:58.7351989Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'...
2025-12-04T09:18:58.7353029Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'...
2025-12-04T09:18:58.7648530Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'...
2025-12-04T09:18:58.8339838Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T09:18:58.8562253Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T09:18:58.9464575Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T09:18:58.9851516Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T09:18:58.9874614Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:18:58.9909528Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'...
2025-12-04T09:18:59.1809651Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T09:18:59.1861435Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T09:18:59.2263686Z Entering 'android/libs/fbjni'
2025-12-04T09:18:59.2325717Z Entering 'third_party/FP16'
2025-12-04T09:18:59.2391081Z Entering 'third_party/FXdiv'
2025-12-04T09:18:59.2453331Z Entering 'third_party/NNPACK'
2025-12-04T09:18:59.2512765Z Entering 'third_party/NVTX'
2025-12-04T09:18:59.2573595Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:18:59.2634255Z Entering 'third_party/XNNPACK'
2025-12-04T09:18:59.2709884Z Entering 'third_party/aiter'
2025-12-04T09:18:59.2771834Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:18:59.2842803Z Entering 'third_party/benchmark'
2025-12-04T09:18:59.2903795Z Entering 'third_party/composable_kernel'
2025-12-04T09:18:59.2972698Z Entering 'third_party/cpp-httplib'
2025-12-04T09:18:59.3032700Z Entering 'third_party/cpuinfo'
2025-12-04T09:18:59.3094052Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:18:59.3153659Z Entering 'third_party/cutlass'
2025-12-04T09:18:59.3223410Z Entering 'third_party/fbgemm'
2025-12-04T09:18:59.3282792Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:18:59.3340804Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:18:59.3410840Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:18:59.3471362Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:18:59.3538081Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:18:59.3594871Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:18:59.3651096Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:18:59.3713871Z Entering 'third_party/flash-attention'
2025-12-04T09:18:59.3772413Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:18:59.3835959Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:18:59.3904142Z Entering 'third_party/flatbuffers'
2025-12-04T09:18:59.3965923Z Entering 'third_party/fmt'
2025-12-04T09:18:59.4032597Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:18:59.4091443Z Entering 'third_party/gloo'
2025-12-04T09:18:59.4152441Z Entering 'third_party/googletest'
2025-12-04T09:18:59.4213106Z Entering 'third_party/ideep'
2025-12-04T09:18:59.4269761Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:18:59.4336699Z Entering 'third_party/ittapi'
2025-12-04T09:18:59.4399577Z Entering 'third_party/kineto'
2025-12-04T09:18:59.4461791Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:18:59.4518021Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:18:59.4576822Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:18:59.4636837Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:18:59.4695125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:18:59.4760674Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:18:59.4822886Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:18:59.4880439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:18:59.4939340Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:18:59.5004337Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:18:59.5061823Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:18:59.5119048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:18:59.5180867Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:18:59.5248240Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:18:59.5311168Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:18:59.5373380Z Entering 'third_party/kleidiai'
2025-12-04T09:18:59.5435197Z Entering 'third_party/mimalloc'
2025-12-04T09:18:59.5494813Z Entering 'third_party/nlohmann'
2025-12-04T09:18:59.5557303Z Entering 'third_party/onnx'
2025-12-04T09:18:59.5635142Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:18:59.5698676Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:18:59.5761425Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:18:59.5819797Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:18:59.5875860Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:18:59.5932564Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:18:59.5991141Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:18:59.6047688Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:18:59.6109828Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:18:59.6164658Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:18:59.6225815Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:18:59.6290317Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:18:59.6372120Z Entering 'third_party/pocketfft'
2025-12-04T09:18:59.6434895Z Entering 'third_party/protobuf'
2025-12-04T09:18:59.6496634Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:18:59.6555749Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:18:59.6618028Z Entering 'third_party/psimd'
2025-12-04T09:18:59.6678736Z Entering 'third_party/pthreadpool'
2025-12-04T09:18:59.6738427Z Entering 'third_party/pybind11'
2025-12-04T09:18:59.6798761Z Entering 'third_party/python-peachpy'
2025-12-04T09:18:59.6859217Z Entering 'third_party/sleef'
2025-12-04T09:18:59.6919020Z Entering 'third_party/tensorpipe'
2025-12-04T09:18:59.6978790Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:18:59.7036642Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:18:59.7093333Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:18:59.7152019Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:18:59.7207632Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:18:59.7291464Z ##[endgroup]
2025-12-04T09:18:59.7292119Z ##[group]Persisting credentials for submodules
2025-12-04T09:18:59.7297677Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T09:18:59.7690229Z Entering 'android/libs/fbjni'
2025-12-04T09:18:59.7769331Z Entering 'third_party/FP16'
2025-12-04T09:18:59.7850377Z Entering 'third_party/FXdiv'
2025-12-04T09:18:59.7928954Z Entering 'third_party/NNPACK'
2025-12-04T09:18:59.8007482Z Entering 'third_party/NVTX'
2025-12-04T09:18:59.8089019Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:18:59.8168347Z Entering 'third_party/XNNPACK'
2025-12-04T09:18:59.8262015Z Entering 'third_party/aiter'
2025-12-04T09:18:59.8342396Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:18:59.8431800Z Entering 'third_party/benchmark'
2025-12-04T09:18:59.8512447Z Entering 'third_party/composable_kernel'
2025-12-04T09:18:59.8602503Z Entering 'third_party/cpp-httplib'
2025-12-04T09:18:59.8682100Z Entering 'third_party/cpuinfo'
2025-12-04T09:18:59.8761925Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:18:59.8841156Z Entering 'third_party/cutlass'
2025-12-04T09:18:59.8931329Z Entering 'third_party/fbgemm'
2025-12-04T09:18:59.9015808Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:18:59.9091495Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:18:59.9177453Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:18:59.9253865Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:18:59.9344272Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:18:59.9422159Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:18:59.9497093Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:18:59.9579192Z Entering 'third_party/flash-attention'
2025-12-04T09:18:59.9658709Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:18:59.9743985Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:18:59.9830659Z Entering 'third_party/flatbuffers'
2025-12-04T09:18:59.9914156Z Entering 'third_party/fmt'
2025-12-04T09:18:59.9992603Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:19:00.0073564Z Entering 'third_party/gloo'
2025-12-04T09:19:00.0153280Z Entering 'third_party/googletest'
2025-12-04T09:19:00.0232764Z Entering 'third_party/ideep'
2025-12-04T09:19:00.0310434Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:19:00.0396439Z Entering 'third_party/ittapi'
2025-12-04T09:19:00.0475312Z Entering 'third_party/kineto'
2025-12-04T09:19:00.0555144Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:19:00.0632721Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:19:00.0711644Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:19:00.0789065Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:19:00.0867001Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:19:00.0944604Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:19:00.1025400Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:19:00.1102126Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:19:00.1183588Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:19:00.1262612Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:19:00.1345144Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:19:00.1422433Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:00.1506281Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:00.1591394Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:19:00.1667708Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:19:00.1746601Z Entering 'third_party/kleidiai'
2025-12-04T09:19:00.1828666Z Entering 'third_party/mimalloc'
2025-12-04T09:19:00.1909015Z Entering 'third_party/nlohmann'
2025-12-04T09:19:00.1989590Z Entering 'third_party/onnx'
2025-12-04T09:19:00.2084578Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:19:00.2171282Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:19:00.2251378Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:19:00.2328510Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:19:00.2411934Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:19:00.2492945Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:19:00.2574271Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:19:00.2649701Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:19:00.2726289Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:19:00.2800734Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:00.2879894Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:00.2958913Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:19:00.3058015Z Entering 'third_party/pocketfft'
2025-12-04T09:19:00.3137583Z Entering 'third_party/protobuf'
2025-12-04T09:19:00.3220272Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:19:00.3298688Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:19:00.3378765Z Entering 'third_party/psimd'
2025-12-04T09:19:00.3457121Z Entering 'third_party/pthreadpool'
2025-12-04T09:19:00.3535844Z Entering 'third_party/pybind11'
2025-12-04T09:19:00.3615203Z Entering 'third_party/python-peachpy'
2025-12-04T09:19:00.3693756Z Entering 'third_party/sleef'
2025-12-04T09:19:00.3773852Z Entering 'third_party/tensorpipe'
2025-12-04T09:19:00.3853866Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:19:00.3935043Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:19:00.4014422Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:19:00.4089663Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:19:00.4170918Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:19:00.4279386Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T09:19:00.4676802Z Entering 'android/libs/fbjni'
2025-12-04T09:19:00.4750373Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T09:19:00.4777067Z Entering 'third_party/FP16'
2025-12-04T09:19:00.4848869Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T09:19:00.4875945Z Entering 'third_party/FXdiv'
2025-12-04T09:19:00.4963938Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T09:19:00.4992000Z Entering 'third_party/NNPACK'
2025-12-04T09:19:00.5067234Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T09:19:00.5093905Z Entering 'third_party/NVTX'
2025-12-04T09:19:00.5168074Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T09:19:00.5193368Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:19:00.5265797Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T09:19:00.5289356Z Entering 'third_party/XNNPACK'
2025-12-04T09:19:00.5365535Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T09:19:00.5404978Z Entering 'third_party/aiter'
2025-12-04T09:19:00.5485677Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T09:19:00.5512875Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:19:00.5584982Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T09:19:00.5619620Z Entering 'third_party/benchmark'
2025-12-04T09:19:00.5688655Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:19:00.5714518Z Entering 'third_party/composable_kernel'
2025-12-04T09:19:00.5792871Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T09:19:00.5827096Z Entering 'third_party/cpp-httplib'
2025-12-04T09:19:00.5899819Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T09:19:00.5925773Z Entering 'third_party/cpuinfo'
2025-12-04T09:19:00.5998159Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T09:19:00.6026073Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:19:00.6099430Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T09:19:00.6124112Z Entering 'third_party/cutlass'
2025-12-04T09:19:00.6192834Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T09:19:00.6232104Z Entering 'third_party/fbgemm'
2025-12-04T09:19:00.6303157Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T09:19:00.6331037Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:19:00.6401117Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T09:19:00.6426670Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:19:00.6500576Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T09:19:00.6532744Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:19:00.6604633Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T09:19:00.6629517Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:19:00.6697563Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T09:19:00.6732105Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:19:00.6803520Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T09:19:00.6831973Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:19:00.6903143Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T09:19:00.6926744Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:19:00.7002287Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T09:19:00.7033518Z Entering 'third_party/flash-attention'
2025-12-04T09:19:00.7103625Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T09:19:00.7128379Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:19:00.7205035Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T09:19:00.7239572Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:19:00.7312167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T09:19:00.7347118Z Entering 'third_party/flatbuffers'
2025-12-04T09:19:00.7420893Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T09:19:00.7449470Z Entering 'third_party/fmt'
2025-12-04T09:19:00.7525315Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T09:19:00.7551114Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:19:00.7624582Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T09:19:00.7649085Z Entering 'third_party/gloo'
2025-12-04T09:19:00.7724126Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T09:19:00.7750261Z Entering 'third_party/googletest'
2025-12-04T09:19:00.7831025Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:19:00.7855779Z Entering 'third_party/ideep'
2025-12-04T09:19:00.7933928Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T09:19:00.7959476Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:19:00.8031234Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T09:19:00.8065909Z Entering 'third_party/ittapi'
2025-12-04T09:19:00.8136591Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T09:19:00.8162923Z Entering 'third_party/kineto'
2025-12-04T09:19:00.8237020Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T09:19:00.8261507Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:19:00.8335021Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T09:19:00.8357030Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:19:00.8439350Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T09:19:00.8465884Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:19:00.8549383Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T09:19:00.8574111Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:19:00.8646586Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T09:19:00.8670671Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:19:00.8742485Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T09:19:00.8764185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:19:00.8835367Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T09:19:00.8863574Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:19:00.8935909Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T09:19:00.8959892Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:19:00.9033886Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:19:00.9058960Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:19:00.9131030Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T09:19:00.9155775Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:19:00.9228505Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T09:19:00.9254383Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:19:00.9328131Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T09:19:00.9351445Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:00.9424388Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T09:19:00.9451258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:00.9524530Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T09:19:00.9559127Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:19:00.9630289Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T09:19:00.9653492Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:19:00.9723639Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T09:19:00.9752366Z Entering 'third_party/kleidiai'
2025-12-04T09:19:00.9824640Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T09:19:00.9850468Z Entering 'third_party/mimalloc'
2025-12-04T09:19:00.9926244Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T09:19:00.9951676Z Entering 'third_party/nlohmann'
2025-12-04T09:19:01.0026624Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T09:19:01.0052363Z Entering 'third_party/onnx'
2025-12-04T09:19:01.0125124Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T09:19:01.0166404Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:19:01.0238539Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:19:01.0268181Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:19:01.0340627Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T09:19:01.0366105Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:19:01.0435453Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:19:01.0458651Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:19:01.0532324Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:19:01.0556018Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:19:01.0627692Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T09:19:01.0651346Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:19:01.0727496Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T09:19:01.0754487Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:19:01.0828161Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T09:19:01.0850958Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:19:01.0926496Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T09:19:01.0950506Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:19:01.1029535Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T09:19:01.1043132Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:01.1116355Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T09:19:01.1142788Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:01.1216519Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T09:19:01.1243474Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:19:01.1318300Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T09:19:01.1365625Z Entering 'third_party/pocketfft'
2025-12-04T09:19:01.1438654Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T09:19:01.1461792Z Entering 'third_party/protobuf'
2025-12-04T09:19:01.1533786Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T09:19:01.1560859Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:19:01.1633536Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:19:01.1657906Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:19:01.1732131Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:19:01.1760057Z Entering 'third_party/psimd'
2025-12-04T09:19:01.1830682Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T09:19:01.1856578Z Entering 'third_party/pthreadpool'
2025-12-04T09:19:01.1925966Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T09:19:01.1951380Z Entering 'third_party/pybind11'
2025-12-04T09:19:01.2022038Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:19:01.2047601Z Entering 'third_party/python-peachpy'
2025-12-04T09:19:01.2120430Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T09:19:01.2145413Z Entering 'third_party/sleef'
2025-12-04T09:19:01.2218915Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T09:19:01.2243439Z Entering 'third_party/tensorpipe'
2025-12-04T09:19:01.2314371Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T09:19:01.2337926Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:19:01.2407492Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:19:01.2433584Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:19:01.2503592Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T09:19:01.2527053Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:19:01.2596914Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T09:19:01.2620015Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:19:01.2689366Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:19:01.2710910Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:19:01.2784440Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T09:19:01.3971496Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T09:19:01.4369592Z Entering 'android/libs/fbjni'
2025-12-04T09:19:01.4429854Z Entering 'third_party/FP16'
2025-12-04T09:19:01.4490368Z Entering 'third_party/FXdiv'
2025-12-04T09:19:01.4554428Z Entering 'third_party/NNPACK'
2025-12-04T09:19:01.4615654Z Entering 'third_party/NVTX'
2025-12-04T09:19:01.4676990Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:19:01.4736946Z Entering 'third_party/XNNPACK'
2025-12-04T09:19:01.4819142Z Entering 'third_party/aiter'
2025-12-04T09:19:01.4878326Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:19:01.4948106Z Entering 'third_party/benchmark'
2025-12-04T09:19:01.5008826Z Entering 'third_party/composable_kernel'
2025-12-04T09:19:01.5077743Z Entering 'third_party/cpp-httplib'
2025-12-04T09:19:01.5138263Z Entering 'third_party/cpuinfo'
2025-12-04T09:19:01.5199359Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:19:01.5261025Z Entering 'third_party/cutlass'
2025-12-04T09:19:01.5333067Z Entering 'third_party/fbgemm'
2025-12-04T09:19:01.5396986Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:19:01.5454214Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:19:01.5519441Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:19:01.5577757Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:19:01.5643725Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:19:01.5700895Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:19:01.5761623Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:19:01.5825114Z Entering 'third_party/flash-attention'
2025-12-04T09:19:01.5884591Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:19:01.5948060Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:19:01.6017848Z Entering 'third_party/flatbuffers'
2025-12-04T09:19:01.6080985Z Entering 'third_party/fmt'
2025-12-04T09:19:01.6140333Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:19:01.6203094Z Entering 'third_party/gloo'
2025-12-04T09:19:01.6263601Z Entering 'third_party/googletest'
2025-12-04T09:19:01.6324367Z Entering 'third_party/ideep'
2025-12-04T09:19:01.6383512Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:19:01.6450947Z Entering 'third_party/ittapi'
2025-12-04T09:19:01.6511288Z Entering 'third_party/kineto'
2025-12-04T09:19:01.6569795Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:19:01.6632313Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:19:01.6692548Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:19:01.6750253Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:19:01.6808979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:19:01.6865329Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:19:01.6928130Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:19:01.6985083Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:19:01.7043885Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:19:01.7102232Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:19:01.7161130Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:19:01.7218569Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:01.7280576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:01.7348541Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:19:01.7405318Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:19:01.7469023Z Entering 'third_party/kleidiai'
2025-12-04T09:19:01.7530322Z Entering 'third_party/mimalloc'
2025-12-04T09:19:01.7590391Z Entering 'third_party/nlohmann'
2025-12-04T09:19:01.7654987Z Entering 'third_party/onnx'
2025-12-04T09:19:01.7729744Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:19:01.7794357Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:19:01.7857188Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:19:01.7915506Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:19:01.7972882Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:19:01.8029442Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:19:01.8087254Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:19:01.8145140Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:19:01.8202909Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:19:01.8259570Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:01.8318694Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:01.8378770Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:19:01.8458362Z Entering 'third_party/pocketfft'
2025-12-04T09:19:01.8518650Z Entering 'third_party/protobuf'
2025-12-04T09:19:01.8580987Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:19:01.8639802Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:19:01.8700853Z Entering 'third_party/psimd'
2025-12-04T09:19:01.8762177Z Entering 'third_party/pthreadpool'
2025-12-04T09:19:01.8823567Z Entering 'third_party/pybind11'
2025-12-04T09:19:01.8883983Z Entering 'third_party/python-peachpy'
2025-12-04T09:19:01.8944900Z Entering 'third_party/sleef'
2025-12-04T09:19:01.9004608Z Entering 'third_party/tensorpipe'
2025-12-04T09:19:01.9068839Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:19:01.9125552Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:19:01.9184955Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:19:01.9242379Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:19:01.9301399Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:19:01.9390995Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T09:19:01.9788157Z Entering 'android/libs/fbjni'
2025-12-04T09:19:01.9849774Z Entering 'third_party/FP16'
2025-12-04T09:19:01.9912026Z Entering 'third_party/FXdiv'
2025-12-04T09:19:01.9977154Z Entering 'third_party/NNPACK'
2025-12-04T09:19:02.0038496Z Entering 'third_party/NVTX'
2025-12-04T09:19:02.0099374Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:19:02.0160767Z Entering 'third_party/XNNPACK'
2025-12-04T09:19:02.0238148Z Entering 'third_party/aiter'
2025-12-04T09:19:02.0300824Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:19:02.0369821Z Entering 'third_party/benchmark'
2025-12-04T09:19:02.0433169Z Entering 'third_party/composable_kernel'
2025-12-04T09:19:02.0502412Z Entering 'third_party/cpp-httplib'
2025-12-04T09:19:02.0564810Z Entering 'third_party/cpuinfo'
2025-12-04T09:19:02.0626451Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:19:02.0687284Z Entering 'third_party/cutlass'
2025-12-04T09:19:02.0758360Z Entering 'third_party/fbgemm'
2025-12-04T09:19:02.0821555Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:19:02.0879042Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:19:02.0945776Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:19:02.1002037Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:19:02.1070066Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:19:02.1128372Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:19:02.1184033Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:19:02.1246039Z Entering 'third_party/flash-attention'
2025-12-04T09:19:02.1305104Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:19:02.1371795Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:19:02.1440620Z Entering 'third_party/flatbuffers'
2025-12-04T09:19:02.1506500Z Entering 'third_party/fmt'
2025-12-04T09:19:02.1567517Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:19:02.1635448Z Entering 'third_party/gloo'
2025-12-04T09:19:02.1695832Z Entering 'third_party/googletest'
2025-12-04T09:19:02.1756845Z Entering 'third_party/ideep'
2025-12-04T09:19:02.1815441Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:19:02.1884760Z Entering 'third_party/ittapi'
2025-12-04T09:19:02.1945656Z Entering 'third_party/kineto'
2025-12-04T09:19:02.2004545Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:19:02.2061577Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:19:02.2123229Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:19:02.2180736Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:19:02.2239971Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:19:02.2295655Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:19:02.2358996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:19:02.2416250Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:19:02.2475803Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:19:02.2535795Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:19:02.2595010Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:19:02.2662264Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:02.2721322Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:02.2789509Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:19:02.2851678Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:19:02.2913352Z Entering 'third_party/kleidiai'
2025-12-04T09:19:02.2973529Z Entering 'third_party/mimalloc'
2025-12-04T09:19:02.3033926Z Entering 'third_party/nlohmann'
2025-12-04T09:19:02.3095427Z Entering 'third_party/onnx'
2025-12-04T09:19:02.3172025Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:19:02.3236698Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:19:02.3298251Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:19:02.3356190Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:19:02.3415615Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:19:02.3470893Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:19:02.3530003Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:19:02.3589249Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:19:02.3645924Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:19:02.3700697Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:02.3762825Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:02.3824290Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:19:02.3904757Z Entering 'third_party/pocketfft'
2025-12-04T09:19:02.3965500Z Entering 'third_party/protobuf'
2025-12-04T09:19:02.4027773Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:19:02.4085040Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:19:02.4151464Z Entering 'third_party/psimd'
2025-12-04T09:19:02.4212224Z Entering 'third_party/pthreadpool'
2025-12-04T09:19:02.4272482Z Entering 'third_party/pybind11'
2025-12-04T09:19:02.4333302Z Entering 'third_party/python-peachpy'
2025-12-04T09:19:02.4393775Z Entering 'third_party/sleef'
2025-12-04T09:19:02.4455299Z Entering 'third_party/tensorpipe'
2025-12-04T09:19:02.4515556Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:19:02.4572927Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:19:02.4630436Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:19:02.4687973Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:19:02.4742683Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:19:02.4826517Z ##[endgroup]
2025-12-04T09:19:02.4874034Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T09:19:02.4902958Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:19:02.5035893Z ##[group]Run cd "${GITHUB_WORKSPACE}"
2025-12-04T09:19:02.5036233Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T09:19:02.5036530Z [36;1m# Clean stale submodule dirs[0m
2025-12-04T09:19:02.5036840Z [36;1mif [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T09:19:02.5037221Z [36;1m  sudo git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T09:19:02.5037590Z [36;1melse[0m
2025-12-04T09:19:02.5037881Z [36;1m  git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T09:19:02.5038237Z [36;1mfi[0m
2025-12-04T09:19:02.5049945Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:02.5050311Z env:
2025-12-04T09:19:02.5050516Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:02.5050761Z   NO_SUDO: true
2025-12-04T09:19:02.5050970Z ##[endgroup]
2025-12-04T09:19:02.5480167Z Entering 'android/libs/fbjni'
2025-12-04T09:19:02.5530583Z Entering 'third_party/FP16'
2025-12-04T09:19:02.5575801Z Entering 'third_party/FXdiv'
2025-12-04T09:19:02.5621540Z Entering 'third_party/NNPACK'
2025-12-04T09:19:02.5672464Z Entering 'third_party/NVTX'
2025-12-04T09:19:02.5729250Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:19:02.5777703Z Entering 'third_party/XNNPACK'
2025-12-04T09:19:02.5937902Z Entering 'third_party/aiter'
2025-12-04T09:19:02.5997179Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:19:02.6147120Z Entering 'third_party/benchmark'
2025-12-04T09:19:02.6197396Z Entering 'third_party/composable_kernel'
2025-12-04T09:19:02.6358622Z Entering 'third_party/cpp-httplib'
2025-12-04T09:19:02.6407390Z Entering 'third_party/cpuinfo'
2025-12-04T09:19:02.6460695Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:19:02.6512228Z Entering 'third_party/cutlass'
2025-12-04T09:19:02.6646783Z Entering 'third_party/fbgemm'
2025-12-04T09:19:02.6731953Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:19:02.6775527Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:19:02.6931405Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:19:02.6981775Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:19:02.7117644Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:19:02.7163599Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:19:02.7204463Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:19:02.7268668Z Entering 'third_party/flash-attention'
2025-12-04T09:19:02.7338082Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:19:02.7469026Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:19:02.7589380Z Entering 'third_party/flatbuffers'
2025-12-04T09:19:02.7690522Z Entering 'third_party/fmt'
2025-12-04T09:19:02.7738323Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:19:02.7790839Z Entering 'third_party/gloo'
2025-12-04T09:19:02.7840266Z Entering 'third_party/googletest'
2025-12-04T09:19:02.7889513Z Entering 'third_party/ideep'
2025-12-04T09:19:02.7933322Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:19:02.8056116Z Entering 'third_party/ittapi'
2025-12-04T09:19:02.8105420Z Entering 'third_party/kineto'
2025-12-04T09:19:02.8155575Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:19:02.8206104Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:19:02.8270758Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:19:02.8315118Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:19:02.8360568Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:19:02.8401219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:19:02.8447951Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:19:02.8491878Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:19:02.8540380Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:19:02.8595424Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:19:02.8639185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:19:02.8683371Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:02.8750894Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:02.8806830Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:19:02.8850959Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:19:02.8900546Z Entering 'third_party/kleidiai'
2025-12-04T09:19:02.8958141Z Entering 'third_party/mimalloc'
2025-12-04T09:19:02.9009947Z Entering 'third_party/nlohmann'
2025-12-04T09:19:02.9077427Z Entering 'third_party/onnx'
2025-12-04T09:19:02.9543952Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:19:02.9597284Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:19:02.9676260Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:19:02.9720414Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:19:02.9766253Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:19:02.9813625Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:19:02.9871571Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:19:02.9916278Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:19:02.9959356Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:19:03.0002699Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:19:03.0068905Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:19:03.0119434Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:19:03.0480610Z Entering 'third_party/pocketfft'
2025-12-04T09:19:03.0531458Z Entering 'third_party/protobuf'
2025-12-04T09:19:03.0639361Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:19:03.0683028Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:19:03.0737212Z Entering 'third_party/psimd'
2025-12-04T09:19:03.0781443Z Entering 'third_party/pthreadpool'
2025-12-04T09:19:03.0826581Z Entering 'third_party/pybind11'
2025-12-04T09:19:03.0876721Z Entering 'third_party/python-peachpy'
2025-12-04T09:19:03.0923820Z Entering 'third_party/sleef'
2025-12-04T09:19:03.0973210Z Entering 'third_party/tensorpipe'
2025-12-04T09:19:03.1023302Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:19:03.1072495Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:19:03.1120354Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:19:03.1169129Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:19:03.1210961Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:19:03.1391197Z Prepare all required actions
2025-12-04T09:19:03.1391731Z Getting action download info
2025-12-04T09:19:03.2884990Z ##[group]Run ./.github/actions/setup-linux
2025-12-04T09:19:03.2885296Z env:
2025-12-04T09:19:03.2885506Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:03.2885759Z ##[endgroup]
2025-12-04T09:19:03.2923953Z ##[group]Run set -euo pipefail
2025-12-04T09:19:03.2924269Z [36;1mset -euo pipefail[0m
2025-12-04T09:19:03.2924558Z [36;1mfunction get_ec2_metadata() {[0m
2025-12-04T09:19:03.2924927Z [36;1m  # Pulled from instance metadata endpoint for EC2[0m
2025-12-04T09:19:03.2925547Z [36;1m  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html[0m
2025-12-04T09:19:03.2926113Z [36;1m  category=$1[0m
2025-12-04T09:19:03.2926626Z [36;1m  # If it is GCP runner (runner name contains gcp), do not run this[0m
2025-12-04T09:19:03.2927047Z [36;1m  runner_name_str=i-0f694664a515f0ebd[0m
2025-12-04T09:19:03.2927411Z [36;1m  if [[ -f /.inarc ]]; then[0m
2025-12-04T09:19:03.2927742Z [36;1m    echo "ARC Runner, no info on ec2 metadata"[0m
2025-12-04T09:19:03.2928123Z [36;1m  elif [[ $runner_name_str == *"gcp"* ]]; then[0m
2025-12-04T09:19:03.2928579Z [36;1m    echo "Runner is from Google Cloud Platform, No info on ec2 metadata"[0m
2025-12-04T09:19:03.2929017Z [36;1m  else[0m
2025-12-04T09:19:03.2929867Z [36;1m    curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}"[0m
2025-12-04T09:19:03.2930771Z [36;1m  fi[0m
2025-12-04T09:19:03.2930974Z [36;1m}[0m
2025-12-04T09:19:03.2931230Z [36;1mecho "ami-id: $(get_ec2_metadata ami-id)"[0m
2025-12-04T09:19:03.2931654Z [36;1mecho "instance-id: $(get_ec2_metadata instance-id)"[0m
2025-12-04T09:19:03.2932129Z [36;1mecho "instance-type: $(get_ec2_metadata instance-type)"[0m
2025-12-04T09:19:03.2932539Z [36;1mecho "system info $(uname -a)"[0m
2025-12-04T09:19:03.2941777Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:03.2942139Z env:
2025-12-04T09:19:03.2942336Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:03.2942577Z ##[endgroup]
2025-12-04T09:19:03.3132079Z ami-id: ami-08982f1c5bf93d976
2025-12-04T09:19:03.3255841Z instance-id: i-0f694664a515f0ebd
2025-12-04T09:19:03.3383323Z instance-type: g5.4xlarge
2025-12-04T09:19:03.3399065Z system info Linux ip-10-0-18-14.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep  9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
2025-12-04T09:19:03.3424854Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
2025-12-04T09:19:03.3425448Z [36;1mif [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi[0m
2025-12-04T09:19:03.3435264Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:03.3435636Z env:
2025-12-04T09:19:03.3435833Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:03.3436091Z ##[endgroup]
2025-12-04T09:19:04.9561322Z Thu Dec  4 09:19:04 2025       
2025-12-04T09:19:04.9561879Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:19:04.9562412Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:19:04.9562925Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:19:04.9563450Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:19:04.9564007Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:19:04.9564464Z |                                         |                        |               MIG M. |
2025-12-04T09:19:04.9564812Z |=========================================+========================+======================|
2025-12-04T09:19:04.9658805Z |   0  NVIDIA A10G                    Off |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:19:04.9659652Z |  0%   24C    P0             52W /  300W |       0MiB /  23028MiB |      3%      Default |
2025-12-04T09:19:04.9660061Z |                                         |                        |                  N/A |
2025-12-04T09:19:04.9660477Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:19:04.9660779Z 
2025-12-04T09:19:04.9661126Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:19:04.9661570Z | Processes:                                                                              |
2025-12-04T09:19:04.9662043Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:19:04.9662479Z |        ID   ID                                                               Usage      |
2025-12-04T09:19:04.9662988Z |=========================================================================================|
2025-12-04T09:19:04.9664135Z |  No running processes found                                                             |
2025-12-04T09:19:04.9664640Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:19:05.4070462Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:19:05.4071429Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:19:05.4084632Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:05.4085006Z env:
2025-12-04T09:19:05.4085216Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:05.4085470Z ##[endgroup]
2025-12-04T09:19:05.4150939Z ##[group]Run if systemctl is-active --quiet docker; then
2025-12-04T09:19:05.4151394Z [36;1mif systemctl is-active --quiet docker; then[0m
2025-12-04T09:19:05.4151775Z [36;1m    echo "Docker daemon is running...";[0m
2025-12-04T09:19:05.4152104Z [36;1melse[0m
2025-12-04T09:19:05.4152446Z [36;1m    echo "Starting docker daemon..." && sudo systemctl start docker;[0m
2025-12-04T09:19:05.4152865Z [36;1mfi[0m
2025-12-04T09:19:05.4161621Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:05.4162046Z env:
2025-12-04T09:19:05.4162243Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:05.4162500Z ##[endgroup]
2025-12-04T09:19:05.4266646Z Docker daemon is running...
2025-12-04T09:19:05.4306829Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:19:05.4307126Z with:
2025-12-04T09:19:05.4307318Z   shell: bash
2025-12-04T09:19:05.4307524Z   timeout_minutes: 5
2025-12-04T09:19:05.4308056Z   max_attempts: 3
2025-12-04T09:19:05.4308289Z   retry_wait_seconds: 30
2025-12-04T09:19:05.4310643Z   command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

# For LF Runners we need to make sure we also login to Meta's ECR docker registry too.
META_AWS_ACCOUNT_ID=308535385114
if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then
    aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
        --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
fi

2025-12-04T09:19:05.4313014Z   polling_interval_seconds: 1
2025-12-04T09:19:05.4313284Z   warning_on_retry: true
2025-12-04T09:19:05.4313535Z   continue_on_error: false
2025-12-04T09:19:05.4313767Z env:
2025-12-04T09:19:05.4313967Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:05.4314224Z   AWS_RETRY_MODE: standard
2025-12-04T09:19:05.4314466Z   AWS_MAX_ATTEMPTS: 5
2025-12-04T09:19:05.4314710Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T09:19:05.4314976Z ##[endgroup]
2025-12-04T09:19:06.5779334Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:19:06.5780598Z Configure a credential helper to remove this warning. See
2025-12-04T09:19:06.5781345Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:19:06.5781822Z 
2025-12-04T09:19:06.5781934Z Login Succeeded
2025-12-04T09:19:07.5162343Z Command completed after 1 attempt(s).
2025-12-04T09:19:07.5247910Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:19:07.5248416Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:19:07.5248869Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:19:07.5260951Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:07.5261317Z env:
2025-12-04T09:19:07.5261509Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:07.5261753Z ##[endgroup]
2025-12-04T09:19:07.5358931Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T09:19:07.5359482Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T09:19:07.5359979Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T09:19:07.5360296Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T09:19:07.5360627Z [36;1m# Prune all of the docker images[0m
2025-12-04T09:19:07.5360941Z [36;1mdocker system prune -af[0m
2025-12-04T09:19:07.5369472Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:07.5369828Z env:
2025-12-04T09:19:07.5370036Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:07.5370284Z ##[endgroup]
2025-12-04T09:19:07.5714959Z "docker stop" requires at least 1 argument.
2025-12-04T09:19:07.5715358Z See 'docker stop --help'.
2025-12-04T09:19:07.5715527Z 
2025-12-04T09:19:07.5715687Z Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]
2025-12-04T09:19:07.5715964Z 
2025-12-04T09:19:07.5716067Z Stop one or more running containers
2025-12-04T09:19:07.5954816Z Total reclaimed space: 0B
2025-12-04T09:19:07.6150856Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main
2025-12-04T09:19:07.6151331Z with:
2025-12-04T09:19:07.6152148Z   docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6153071Z   use-custom-docker-registry: true
2025-12-04T09:19:07.6153389Z   docker-build-dir: .ci/docker
2025-12-04T09:19:07.6153683Z   docker-build-script: ./build.sh
2025-12-04T09:19:07.6153977Z   working-directory: .
2025-12-04T09:19:07.6154328Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:07.6154724Z   force-push: false
2025-12-04T09:19:07.6154946Z env:
2025-12-04T09:19:07.6155145Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:07.6155396Z ##[endgroup]
2025-12-04T09:19:07.6173965Z ##[group]Run set -ex
2025-12-04T09:19:07.6174229Z [36;1mset -ex[0m
2025-12-04T09:19:07.6174460Z [36;1m[0m
2025-12-04T09:19:07.6174871Z [36;1m# If the docker build directory or the build script doesn't exist, the action will[0m
2025-12-04T09:19:07.6175542Z [36;1m# gracefully return the docker image name as it is.  Pulling docker image in Linux[0m
2025-12-04T09:19:07.6176128Z [36;1m# job could then download the pre-built image as usual[0m
2025-12-04T09:19:07.6176823Z [36;1mif [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then[0m
2025-12-04T09:19:07.6177514Z [36;1m  echo "skip=false" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6177841Z [36;1melse[0m
2025-12-04T09:19:07.6178088Z [36;1m  echo "skip=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6178537Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6178931Z [36;1m[0m
2025-12-04T09:19:07.6179544Z [36;1m  echo "Not using custom ECR registry.  Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..."[0m
2025-12-04T09:19:07.6180189Z [36;1m  exit 0[0m
2025-12-04T09:19:07.6180401Z [36;1mfi[0m
2025-12-04T09:19:07.6180607Z [36;1m[0m
2025-12-04T09:19:07.6180944Z [36;1mif [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then[0m
2025-12-04T09:19:07.6181544Z [36;1m  # The docker image name already includes the ECR prefix and tag, so we can just[0m
2025-12-04T09:19:07.6182070Z [36;1m  # use it as it is, but first let's extract the tag[0m
2025-12-04T09:19:07.6182549Z [36;1m  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')[0m
2025-12-04T09:19:07.6183060Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6183548Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6183959Z [36;1melse[0m
2025-12-04T09:19:07.6184226Z [36;1m  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then[0m
2025-12-04T09:19:07.6184609Z [36;1m    CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:}[0m
2025-12-04T09:19:07.6185177Z [36;1m    DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*}[0m
2025-12-04T09:19:07.6185513Z [36;1m  fi[0m
2025-12-04T09:19:07.6185975Z [36;1m  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}")[0m
2025-12-04T09:19:07.6186587Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6187243Z [36;1m  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6188008Z [36;1m  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6188433Z [36;1mfi[0m
2025-12-04T09:19:07.6197634Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:07.6198008Z env:
2025-12-04T09:19:07.6198216Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:07.6198468Z   REPO_NAME: pytorch
2025-12-04T09:19:07.6199447Z   DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6200356Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T09:19:07.6213528Z   DOCKER_BUILD_SCRIPT: ./build.sh
2025-12-04T09:19:07.6213917Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:07.6214314Z   USE_CUSTOM_DOCKER_REGISTRY: true
2025-12-04T09:19:07.6214597Z   CUSTOM_TAG_PREFIX: 
2025-12-04T09:19:07.6214830Z ##[endgroup]
2025-12-04T09:19:07.6245977Z + [[ -d .ci/docker ]]
2025-12-04T09:19:07.6246669Z + [[ -f .ci/docker/./build.sh ]]
2025-12-04T09:19:07.6247129Z + [[ true == \t\r\u\e ]]
2025-12-04T09:19:07.6247503Z + echo skip=false
2025-12-04T09:19:07.6248912Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]]
2025-12-04T09:19:07.6255820Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6256686Z ++ awk -F '[:,]' '{print $2}'
2025-12-04T09:19:07.6292606Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6293523Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6294735Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6322622Z ##[group]Run set +e
2025-12-04T09:19:07.6322903Z [36;1mset +e[0m
2025-12-04T09:19:07.6323110Z [36;1mset -x[0m
2025-12-04T09:19:07.6323335Z [36;1m[0m
2025-12-04T09:19:07.6323603Z [36;1mlogin() {[0m
2025-12-04T09:19:07.6324147Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T09:19:07.6324670Z [36;1m}[0m
2025-12-04T09:19:07.6324871Z [36;1m[0m
2025-12-04T09:19:07.6325068Z [36;1mretry () {[0m
2025-12-04T09:19:07.6325322Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T09:19:07.6325630Z [36;1m}[0m
2025-12-04T09:19:07.6325826Z [36;1m[0m
2025-12-04T09:19:07.6326044Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T09:19:07.6326347Z [36;1m[0m
2025-12-04T09:19:07.6326554Z [36;1mSTART_TIME=$(date +%s)[0m
2025-12-04T09:19:07.6326832Z [36;1m# Wait up to 120 minutes[0m
2025-12-04T09:19:07.6327229Z [36;1mwhile [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do[0m
2025-12-04T09:19:07.6327733Z [36;1m  # Check if image already exists, if it does then skip building it[0m
2025-12-04T09:19:07.6328222Z [36;1m  if docker manifest inspect "${DOCKER_IMAGE}"; then[0m
2025-12-04T09:19:07.6328571Z [36;1m    exit 0[0m
2025-12-04T09:19:07.6328794Z [36;1m  fi[0m
2025-12-04T09:19:07.6329002Z [36;1m[0m
2025-12-04T09:19:07.6329571Z [36;1m  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can[0m
2025-12-04T09:19:07.6330240Z [36;1m  # use this to differentiate between the Docker build and regular build jobs. For the[0m
2025-12-04T09:19:07.6330906Z [36;1m  # latter, it will wait for the Docker images to become available before continuing[0m
2025-12-04T09:19:07.6331427Z [36;1m  if [ "${DOCKER_PUSH:-false}" == "true" ]; then[0m
2025-12-04T09:19:07.6331813Z [36;1m    # It's a Docker build job, let's build the image[0m
2025-12-04T09:19:07.6332160Z [36;1m    break[0m
2025-12-04T09:19:07.6332382Z [36;1m  else[0m
2025-12-04T09:19:07.6332706Z [36;1m    # It's a regular build job, wait for the image to become available[0m
2025-12-04T09:19:07.6333113Z [36;1m    sleep 300[0m
2025-12-04T09:19:07.6333355Z [36;1m  fi[0m
2025-12-04T09:19:07.6333558Z [36;1mdone[0m
2025-12-04T09:19:07.6333751Z [36;1m[0m
2025-12-04T09:19:07.6334091Z [36;1m# NB: This part requires a full checkout. Otherwise, the merge base will[0m
2025-12-04T09:19:07.6334830Z [36;1m# be empty.  The default action would be to continue rebuild the image[0m
2025-12-04T09:19:07.6335340Z [36;1mif [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then[0m
2025-12-04T09:19:07.6335799Z [36;1m  # if we're on the base branch then use the parent commit[0m
2025-12-04T09:19:07.6336198Z [36;1m  MERGE_BASE=$(git rev-parse HEAD~)[0m
2025-12-04T09:19:07.6336500Z [36;1melse[0m
2025-12-04T09:19:07.6336803Z [36;1m  # otherwise we're on a PR, so use the most recent base commit[0m
2025-12-04T09:19:07.6337274Z [36;1m  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")[0m
2025-12-04T09:19:07.6337636Z [36;1mfi[0m
2025-12-04T09:19:07.6337827Z [36;1m[0m
2025-12-04T09:19:07.6338047Z [36;1mif [[ -z "${MERGE_BASE}" ]]; then[0m
2025-12-04T09:19:07.6338398Z [36;1m  echo "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6338711Z [36;1m[0m
2025-12-04T09:19:07.6339272Z [36;1m  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..."[0m
2025-12-04T09:19:07.6339837Z [36;1m  exit 0[0m
2025-12-04T09:19:07.6340054Z [36;1mfi[0m
2025-12-04T09:19:07.6340247Z [36;1m[0m
2025-12-04T09:19:07.6340544Z [36;1mif ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then[0m
2025-12-04T09:19:07.6341230Z [36;1m  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit"[0m
2025-12-04T09:19:07.6341805Z [36;1m  exit 1[0m
2025-12-04T09:19:07.6342018Z [36;1mfi[0m
2025-12-04T09:19:07.6342215Z [36;1m[0m
2025-12-04T09:19:07.6342562Z [36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}")[0m
2025-12-04T09:19:07.6343221Z [36;1m# If no image exists but the hash is the same as the previous hash then we should error out here[0m
2025-12-04T09:19:07.6343804Z [36;1mif [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then[0m
2025-12-04T09:19:07.6344501Z [36;1m  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"[0m
2025-12-04T09:19:07.6345287Z [36;1m  echo "         Will re-build docker image to store in local cache, TTS may be longer"[0m
2025-12-04T09:19:07.6345741Z [36;1mfi[0m
2025-12-04T09:19:07.6345943Z [36;1m[0m
2025-12-04T09:19:07.6346202Z [36;1mecho "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:19:07.6355401Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:07.6355777Z env:
2025-12-04T09:19:07.6355979Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:07.6356237Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T09:19:07.6356574Z   BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:19:07.6357569Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6358730Z   DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:07.6359499Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:07.6359900Z   DOCKER_PUSH: 
2025-12-04T09:19:07.6360122Z ##[endgroup]
2025-12-04T09:19:07.6389971Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:07.6390386Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:07.6393044Z + aws ecr get-login-password --region us-east-1
2025-12-04T09:19:07.6395094Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:08.1589916Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:19:08.1590535Z Configure a credential helper to remove this warning. See
2025-12-04T09:19:08.1591102Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:19:08.1591495Z 
2025-12-04T09:19:08.1598978Z Login Succeeded
2025-12-04T09:19:08.1626607Z ++ date +%s
2025-12-04T09:19:08.1641090Z + START_TIME=1764839948
2025-12-04T09:19:08.1644709Z ++ date +%s
2025-12-04T09:19:08.1658309Z + [[ 1764832748 -lt 1764839948 ]]
2025-12-04T09:19:08.1659292Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:08.3836192Z {
2025-12-04T09:19:08.3836488Z 	"schemaVersion": 2,
2025-12-04T09:19:08.3837050Z 	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
2025-12-04T09:19:08.3837653Z 	"config": {
2025-12-04T09:19:08.3838059Z 		"mediaType": "application/vnd.docker.container.image.v1+json",
2025-12-04T09:19:08.3838547Z 		"size": 34864,
2025-12-04T09:19:08.3839053Z 		"digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301"
2025-12-04T09:19:08.3839641Z 	},
2025-12-04T09:19:08.3839868Z 	"layers": [
2025-12-04T09:19:08.3840118Z 		{
2025-12-04T09:19:08.3840518Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3841033Z 			"size": 30447951,
2025-12-04T09:19:08.3841586Z 			"digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63"
2025-12-04T09:19:08.3842180Z 		},
2025-12-04T09:19:08.3842410Z 		{
2025-12-04T09:19:08.3842805Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3843314Z 			"size": 1554,
2025-12-04T09:19:08.3843805Z 			"digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a"
2025-12-04T09:19:08.3844369Z 		},
2025-12-04T09:19:08.3844605Z 		{
2025-12-04T09:19:08.3845008Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3845518Z 			"size": 313275661,
2025-12-04T09:19:08.3846066Z 			"digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c"
2025-12-04T09:19:08.3846672Z 		},
2025-12-04T09:19:08.3846902Z 		{
2025-12-04T09:19:08.3847307Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3847825Z 			"size": 787,
2025-12-04T09:19:08.3848350Z 			"digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f"
2025-12-04T09:19:08.3848953Z 		},
2025-12-04T09:19:08.3849190Z 		{
2025-12-04T09:19:08.3849598Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3850101Z 			"size": 106,
2025-12-04T09:19:08.3850613Z 			"digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234"
2025-12-04T09:19:08.3851205Z 		},
2025-12-04T09:19:08.3851577Z 		{
2025-12-04T09:19:08.3851985Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3852506Z 			"size": 703,
2025-12-04T09:19:08.3853003Z 			"digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af"
2025-12-04T09:19:08.3853594Z 		},
2025-12-04T09:19:08.3853833Z 		{
2025-12-04T09:19:08.3854238Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3854760Z 			"size": 1216,
2025-12-04T09:19:08.3855261Z 			"digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0"
2025-12-04T09:19:08.3856222Z 		},
2025-12-04T09:19:08.3856458Z 		{
2025-12-04T09:19:08.3856876Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3857440Z 			"size": 483,
2025-12-04T09:19:08.3857918Z 			"digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628"
2025-12-04T09:19:08.3858507Z 		},
2025-12-04T09:19:08.3858732Z 		{
2025-12-04T09:19:08.3859238Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3859776Z 			"size": 110361875,
2025-12-04T09:19:08.3860313Z 			"digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77"
2025-12-04T09:19:08.3860907Z 		},
2025-12-04T09:19:08.3861139Z 		{
2025-12-04T09:19:08.3861545Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3862079Z 			"size": 4961,
2025-12-04T09:19:08.3862609Z 			"digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2"
2025-12-04T09:19:08.3863224Z 		},
2025-12-04T09:19:08.3863460Z 		{
2025-12-04T09:19:08.3864079Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3864621Z 			"size": 1755,
2025-12-04T09:19:08.3865144Z 			"digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15"
2025-12-04T09:19:08.3865746Z 		},
2025-12-04T09:19:08.3865979Z 		{
2025-12-04T09:19:08.3866396Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3866928Z 			"size": 724,
2025-12-04T09:19:08.3867442Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T09:19:08.3868043Z 		},
2025-12-04T09:19:08.3868286Z 		{
2025-12-04T09:19:08.3868694Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3869212Z 			"size": 543,
2025-12-04T09:19:08.3869696Z 			"digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066"
2025-12-04T09:19:08.3870158Z 		},
2025-12-04T09:19:08.3870339Z 		{
2025-12-04T09:19:08.3870657Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3871060Z 			"size": 3185190117,
2025-12-04T09:19:08.3871498Z 			"digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735"
2025-12-04T09:19:08.3871980Z 		},
2025-12-04T09:19:08.3872152Z 		{
2025-12-04T09:19:08.3872463Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3872866Z 			"size": 32,
2025-12-04T09:19:08.3873264Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3873795Z 		},
2025-12-04T09:19:08.3873984Z 		{
2025-12-04T09:19:08.3874306Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3874710Z 			"size": 396,
2025-12-04T09:19:08.3875118Z 			"digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5"
2025-12-04T09:19:08.3875592Z 		},
2025-12-04T09:19:08.3875763Z 		{
2025-12-04T09:19:08.3876078Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3876499Z 			"size": 236860,
2025-12-04T09:19:08.3876901Z 			"digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee"
2025-12-04T09:19:08.3877374Z 		},
2025-12-04T09:19:08.3877564Z 		{
2025-12-04T09:19:08.3877874Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3878287Z 			"size": 231,
2025-12-04T09:19:08.3878699Z 			"digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6"
2025-12-04T09:19:08.3879171Z 		},
2025-12-04T09:19:08.3879349Z 		{
2025-12-04T09:19:08.3879665Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3880077Z 			"size": 3043497,
2025-12-04T09:19:08.3880488Z 			"digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035"
2025-12-04T09:19:08.3880970Z 		},
2025-12-04T09:19:08.3881153Z 		{
2025-12-04T09:19:08.3881460Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3881981Z 			"size": 1472,
2025-12-04T09:19:08.3882396Z 			"digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42"
2025-12-04T09:19:08.3882870Z 		},
2025-12-04T09:19:08.3883045Z 		{
2025-12-04T09:19:08.3883358Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3883763Z 			"size": 481,
2025-12-04T09:19:08.3884167Z 			"digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979"
2025-12-04T09:19:08.3884640Z 		},
2025-12-04T09:19:08.3884818Z 		{
2025-12-04T09:19:08.3885130Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3885538Z 			"size": 202,
2025-12-04T09:19:08.3885956Z 			"digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb"
2025-12-04T09:19:08.3886425Z 		},
2025-12-04T09:19:08.3886605Z 		{
2025-12-04T09:19:08.3886926Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3887371Z 			"size": 607,
2025-12-04T09:19:08.3887898Z 			"digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5"
2025-12-04T09:19:08.3888381Z 		},
2025-12-04T09:19:08.3888553Z 		{
2025-12-04T09:19:08.3888869Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3889291Z 			"size": 7889619584,
2025-12-04T09:19:08.3889707Z 			"digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b"
2025-12-04T09:19:08.3890178Z 		},
2025-12-04T09:19:08.3890353Z 		{
2025-12-04T09:19:08.3890669Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3891071Z 			"size": 830,
2025-12-04T09:19:08.3891487Z 			"digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b"
2025-12-04T09:19:08.3891959Z 		},
2025-12-04T09:19:08.3892138Z 		{
2025-12-04T09:19:08.3892461Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3892876Z 			"size": 33451739,
2025-12-04T09:19:08.3893300Z 			"digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a"
2025-12-04T09:19:08.3893773Z 		},
2025-12-04T09:19:08.3893955Z 		{
2025-12-04T09:19:08.3894268Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3894679Z 			"size": 104,
2025-12-04T09:19:08.3895101Z 			"digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb"
2025-12-04T09:19:08.3895585Z 		},
2025-12-04T09:19:08.3895762Z 		{
2025-12-04T09:19:08.3896093Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3896513Z 			"size": 1496,
2025-12-04T09:19:08.3896940Z 			"digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3"
2025-12-04T09:19:08.3897415Z 		},
2025-12-04T09:19:08.3897592Z 		{
2025-12-04T09:19:08.3897926Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3898359Z 			"size": 458787828,
2025-12-04T09:19:08.3898788Z 			"digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6"
2025-12-04T09:19:08.3899348Z 		},
2025-12-04T09:19:08.3899534Z 		{
2025-12-04T09:19:08.3899860Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3912183Z 			"size": 164,
2025-12-04T09:19:08.3912766Z 			"digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98"
2025-12-04T09:19:08.3913244Z 		},
2025-12-04T09:19:08.3913414Z 		{
2025-12-04T09:19:08.3913738Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3914146Z 			"size": 346,
2025-12-04T09:19:08.3914554Z 			"digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35"
2025-12-04T09:19:08.3915014Z 		},
2025-12-04T09:19:08.3915194Z 		{
2025-12-04T09:19:08.3915515Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3915917Z 			"size": 32,
2025-12-04T09:19:08.3916329Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3916997Z 		},
2025-12-04T09:19:08.3917177Z 		{
2025-12-04T09:19:08.3917490Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3917901Z 			"size": 106,
2025-12-04T09:19:08.3918305Z 			"digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52"
2025-12-04T09:19:08.3918782Z 		},
2025-12-04T09:19:08.3918962Z 		{
2025-12-04T09:19:08.3919271Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3919675Z 			"size": 424,
2025-12-04T09:19:08.3920099Z 			"digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e"
2025-12-04T09:19:08.3920583Z 		},
2025-12-04T09:19:08.3920760Z 		{
2025-12-04T09:19:08.3921070Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3921482Z 			"size": 19309374,
2025-12-04T09:19:08.3921900Z 			"digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50"
2025-12-04T09:19:08.3922560Z 		},
2025-12-04T09:19:08.3922730Z 		{
2025-12-04T09:19:08.3923166Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3923570Z 			"size": 108,
2025-12-04T09:19:08.3923967Z 			"digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e"
2025-12-04T09:19:08.3924418Z 		},
2025-12-04T09:19:08.3924597Z 		{
2025-12-04T09:19:08.3924905Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3925298Z 			"size": 827,
2025-12-04T09:19:08.3925694Z 			"digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e"
2025-12-04T09:19:08.3926159Z 		},
2025-12-04T09:19:08.3926334Z 		{
2025-12-04T09:19:08.3926634Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3927044Z 			"size": 724,
2025-12-04T09:19:08.3927492Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T09:19:08.3927940Z 		},
2025-12-04T09:19:08.3928118Z 		{
2025-12-04T09:19:08.3928442Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3928835Z 			"size": 149,
2025-12-04T09:19:08.3929230Z 			"digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a"
2025-12-04T09:19:08.3929690Z 		},
2025-12-04T09:19:08.3929861Z 		{
2025-12-04T09:19:08.3930176Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3930577Z 			"size": 138,
2025-12-04T09:19:08.3930984Z 			"digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d"
2025-12-04T09:19:08.3931447Z 		},
2025-12-04T09:19:08.3931619Z 		{
2025-12-04T09:19:08.3931936Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3932332Z 			"size": 141,
2025-12-04T09:19:08.3932735Z 			"digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099"
2025-12-04T09:19:08.3933204Z 		},
2025-12-04T09:19:08.3933395Z 		{
2025-12-04T09:19:08.3933802Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3934230Z 			"size": 32,
2025-12-04T09:19:08.3934631Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3935107Z 		},
2025-12-04T09:19:08.3935287Z 		{
2025-12-04T09:19:08.3935592Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3935997Z 			"size": 223,
2025-12-04T09:19:08.3936405Z 			"digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503"
2025-12-04T09:19:08.3936872Z 		},
2025-12-04T09:19:08.3937041Z 		{
2025-12-04T09:19:08.3937367Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3937789Z 			"size": 255,
2025-12-04T09:19:08.3938188Z 			"digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241"
2025-12-04T09:19:08.3938658Z 		},
2025-12-04T09:19:08.3938827Z 		{
2025-12-04T09:19:08.3939197Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3939705Z 			"size": 145520672,
2025-12-04T09:19:08.3940124Z 			"digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7"
2025-12-04T09:19:08.3940581Z 		},
2025-12-04T09:19:08.3940752Z 		{
2025-12-04T09:19:08.3941055Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3941447Z 			"size": 106,
2025-12-04T09:19:08.3941837Z 			"digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784"
2025-12-04T09:19:08.3942287Z 		},
2025-12-04T09:19:08.3942461Z 		{
2025-12-04T09:19:08.3942765Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3943171Z 			"size": 312293530,
2025-12-04T09:19:08.3943587Z 			"digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b"
2025-12-04T09:19:08.3944047Z 		},
2025-12-04T09:19:08.3944223Z 		{
2025-12-04T09:19:08.3944546Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3944950Z 			"size": 3058011133,
2025-12-04T09:19:08.3945450Z 			"digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a"
2025-12-04T09:19:08.3945914Z 		},
2025-12-04T09:19:08.3946083Z 		{
2025-12-04T09:19:08.3946392Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3946797Z 			"size": 129,
2025-12-04T09:19:08.3947196Z 			"digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287"
2025-12-04T09:19:08.3947662Z 		},
2025-12-04T09:19:08.3947830Z 		{
2025-12-04T09:19:08.3948137Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3948527Z 			"size": 880,
2025-12-04T09:19:08.3948928Z 			"digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5"
2025-12-04T09:19:08.3949402Z 		},
2025-12-04T09:19:08.3949570Z 		{
2025-12-04T09:19:08.3949881Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3950283Z 			"size": 724,
2025-12-04T09:19:08.3950684Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T09:19:08.3951142Z 		},
2025-12-04T09:19:08.3951324Z 		{
2025-12-04T09:19:08.3951629Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3952036Z 			"size": 139,
2025-12-04T09:19:08.3952433Z 			"digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c"
2025-12-04T09:19:08.3952892Z 		},
2025-12-04T09:19:08.3953062Z 		{
2025-12-04T09:19:08.3953377Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3953780Z 			"size": 32,
2025-12-04T09:19:08.3954182Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3954659Z 		},
2025-12-04T09:19:08.3954840Z 		{
2025-12-04T09:19:08.3955141Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3955543Z 			"size": 159,
2025-12-04T09:19:08.3955942Z 			"digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3"
2025-12-04T09:19:08.3956419Z 		},
2025-12-04T09:19:08.3956598Z 		{
2025-12-04T09:19:08.3956906Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3957336Z 			"size": 1011,
2025-12-04T09:19:08.3957761Z 			"digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd"
2025-12-04T09:19:08.3958233Z 		},
2025-12-04T09:19:08.3958403Z 		{
2025-12-04T09:19:08.3958711Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3959115Z 			"size": 724,
2025-12-04T09:19:08.3959498Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T09:19:08.3959943Z 		},
2025-12-04T09:19:08.3960119Z 		{
2025-12-04T09:19:08.3960429Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3960820Z 			"size": 134,
2025-12-04T09:19:08.3961220Z 			"digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea"
2025-12-04T09:19:08.3961774Z 		},
2025-12-04T09:19:08.3961937Z 		{
2025-12-04T09:19:08.3962247Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3962644Z 			"size": 32,
2025-12-04T09:19:08.3963040Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3963499Z 		},
2025-12-04T09:19:08.3963673Z 		{
2025-12-04T09:19:08.3963977Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3964375Z 			"size": 157,
2025-12-04T09:19:08.3964781Z 			"digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8"
2025-12-04T09:19:08.3965261Z 		},
2025-12-04T09:19:08.3965426Z 		{
2025-12-04T09:19:08.3965743Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3966154Z 			"size": 602,
2025-12-04T09:19:08.3966550Z 			"digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc"
2025-12-04T09:19:08.3967023Z 		},
2025-12-04T09:19:08.3967202Z 		{
2025-12-04T09:19:08.3967651Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3968051Z 			"size": 724,
2025-12-04T09:19:08.3968443Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T09:19:08.3968897Z 		},
2025-12-04T09:19:08.3969068Z 		{
2025-12-04T09:19:08.3969383Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3969791Z 			"size": 155,
2025-12-04T09:19:08.3970188Z 			"digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f"
2025-12-04T09:19:08.3970657Z 		},
2025-12-04T09:19:08.3970837Z 		{
2025-12-04T09:19:08.3971138Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3971534Z 			"size": 32,
2025-12-04T09:19:08.3971932Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3972386Z 		},
2025-12-04T09:19:08.3972553Z 		{
2025-12-04T09:19:08.3972867Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3973267Z 			"size": 188,
2025-12-04T09:19:08.3973674Z 			"digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539"
2025-12-04T09:19:08.3974150Z 		},
2025-12-04T09:19:08.3974330Z 		{
2025-12-04T09:19:08.3974639Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3975040Z 			"size": 1370,
2025-12-04T09:19:08.3975448Z 			"digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694"
2025-12-04T09:19:08.3975917Z 		},
2025-12-04T09:19:08.3976073Z 		{
2025-12-04T09:19:08.3976370Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3976781Z 			"size": 32,
2025-12-04T09:19:08.3977184Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3977682Z 		},
2025-12-04T09:19:08.3977883Z 		{
2025-12-04T09:19:08.3978190Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3978606Z 			"size": 136,
2025-12-04T09:19:08.3979104Z 			"digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d"
2025-12-04T09:19:08.3979566Z 		},
2025-12-04T09:19:08.3979736Z 		{
2025-12-04T09:19:08.3980043Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3980438Z 			"size": 528,
2025-12-04T09:19:08.3980834Z 			"digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8"
2025-12-04T09:19:08.3981312Z 		},
2025-12-04T09:19:08.3981484Z 		{
2025-12-04T09:19:08.3981783Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3982180Z 			"size": 32,
2025-12-04T09:19:08.3982583Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3983048Z 		},
2025-12-04T09:19:08.3983231Z 		{
2025-12-04T09:19:08.3983532Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3984014Z 			"size": 104,
2025-12-04T09:19:08.3984431Z 			"digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4"
2025-12-04T09:19:08.3984912Z 		},
2025-12-04T09:19:08.3985077Z 		{
2025-12-04T09:19:08.3985405Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3985827Z 			"size": 435,
2025-12-04T09:19:08.3986238Z 			"digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44"
2025-12-04T09:19:08.3986714Z 		},
2025-12-04T09:19:08.3986899Z 		{
2025-12-04T09:19:08.3987224Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3987621Z 			"size": 32,
2025-12-04T09:19:08.3988038Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.3988515Z 		},
2025-12-04T09:19:08.3988679Z 		{
2025-12-04T09:19:08.3988992Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3989398Z 			"size": 109,
2025-12-04T09:19:08.3989924Z 			"digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d"
2025-12-04T09:19:08.3990399Z 		},
2025-12-04T09:19:08.3990581Z 		{
2025-12-04T09:19:08.3990893Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3991309Z 			"size": 1896,
2025-12-04T09:19:08.3991721Z 			"digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352"
2025-12-04T09:19:08.3992189Z 		},
2025-12-04T09:19:08.3992363Z 		{
2025-12-04T09:19:08.3992680Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3993093Z 			"size": 197526165,
2025-12-04T09:19:08.3993493Z 			"digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44"
2025-12-04T09:19:08.3993952Z 		},
2025-12-04T09:19:08.3994136Z 		{
2025-12-04T09:19:08.3994448Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3994867Z 			"size": 106,
2025-12-04T09:19:08.3995281Z 			"digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3"
2025-12-04T09:19:08.3995757Z 		},
2025-12-04T09:19:08.3995939Z 		{
2025-12-04T09:19:08.3996255Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3996669Z 			"size": 165,
2025-12-04T09:19:08.3997061Z 			"digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb"
2025-12-04T09:19:08.3997574Z 		},
2025-12-04T09:19:08.3997772Z 		{
2025-12-04T09:19:08.3998088Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.3998494Z 			"size": 7944,
2025-12-04T09:19:08.3998907Z 			"digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760"
2025-12-04T09:19:08.3999377Z 		},
2025-12-04T09:19:08.3999566Z 		{
2025-12-04T09:19:08.3999886Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.4000294Z 			"size": 8071,
2025-12-04T09:19:08.4000697Z 			"digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea"
2025-12-04T09:19:08.4001178Z 		},
2025-12-04T09:19:08.4001349Z 		{
2025-12-04T09:19:08.4001664Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.4002074Z 			"size": 304,
2025-12-04T09:19:08.4002485Z 			"digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26"
2025-12-04T09:19:08.4002953Z 		},
2025-12-04T09:19:08.4003128Z 		{
2025-12-04T09:19:08.4003448Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.4003844Z 			"size": 13364291,
2025-12-04T09:19:08.4004264Z 			"digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6"
2025-12-04T09:19:08.4004742Z 		},
2025-12-04T09:19:08.4004913Z 		{
2025-12-04T09:19:08.4005232Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.4005650Z 			"size": 108,
2025-12-04T09:19:08.4006061Z 			"digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600"
2025-12-04T09:19:08.4006547Z 		},
2025-12-04T09:19:08.4006808Z 		{
2025-12-04T09:19:08.4007121Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.4007534Z 			"size": 54145699,
2025-12-04T09:19:08.4008314Z 			"digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773"
2025-12-04T09:19:08.4008968Z 		},
2025-12-04T09:19:08.4009226Z 		{
2025-12-04T09:19:08.4009644Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:19:08.4010042Z 			"size": 32,
2025-12-04T09:19:08.4010433Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:19:08.4010904Z 		}
2025-12-04T09:19:08.4011071Z 	]
2025-12-04T09:19:08.4011239Z }
2025-12-04T09:19:08.4011427Z + exit 0
2025-12-04T09:19:08.4037435Z ##[group]Run set -eux
2025-12-04T09:19:08.4037692Z [36;1mset -eux[0m
2025-12-04T09:19:08.4038086Z [36;1m# It's ok if this steps fails, it would then be an anonymous user like what we used to have[0m
2025-12-04T09:19:08.4039391Z [36;1maws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true[0m
2025-12-04T09:19:08.4050107Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:08.4050474Z env:
2025-12-04T09:19:08.4050671Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:08.4050918Z ##[endgroup]
2025-12-04T09:19:08.4085074Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token
2025-12-04T09:19:08.4085811Z + jq --raw-output .SecretString
2025-12-04T09:19:08.4087199Z + jq -r .docker_hub_readonly_token
2025-12-04T09:19:08.4088926Z + docker login --username pytorchbot --password-stdin
2025-12-04T09:19:08.9911833Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:19:08.9912442Z Configure a credential helper to remove this warning. See
2025-12-04T09:19:08.9913007Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:19:08.9913411Z 
2025-12-04T09:19:08.9913706Z Login Succeeded
2025-12-04T09:19:09.0043136Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:}
2025-12-04T09:19:09.0043510Z [36;1mtag=${ECR_DOCKER_IMAGE##*:}[0m
2025-12-04T09:19:09.0043916Z [36;1mecho "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}"[0m
2025-12-04T09:19:09.0053126Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:09.0053493Z env:
2025-12-04T09:19:09.0053699Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:09.0054558Z   ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.0055430Z ##[endgroup]
2025-12-04T09:19:09.0088077Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.0136554Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2025-12-04T09:19:09.0136998Z with:
2025-12-04T09:19:09.0137776Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.0138802Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:09.0139288Z env:
2025-12-04T09:19:09.0139482Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:09.0139733Z ##[endgroup]
2025-12-04T09:19:09.0154521Z ##[group]Run set -x
2025-12-04T09:19:09.0154784Z [36;1mset -x[0m
2025-12-04T09:19:09.0154994Z [36;1mset +e[0m
2025-12-04T09:19:09.0155202Z [36;1m[0m
2025-12-04T09:19:09.0155395Z [36;1mlogin() {[0m
2025-12-04T09:19:09.0155852Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T09:19:09.0156366Z [36;1m}[0m
2025-12-04T09:19:09.0156563Z [36;1m[0m
2025-12-04T09:19:09.0156783Z [36;1mretry () {[0m
2025-12-04T09:19:09.0157035Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T09:19:09.0157533Z [36;1m}[0m
2025-12-04T09:19:09.0157751Z [36;1m[0m
2025-12-04T09:19:09.0157968Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T09:19:09.0158259Z [36;1m[0m
2025-12-04T09:19:09.0158741Z [36;1mIMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024')[0m
2025-12-04T09:19:09.0159409Z [36;1mecho "Compressed size of image in MB: ${IMAGE_SIZE}"[0m
2025-12-04T09:19:09.0159769Z [36;1m[0m
2025-12-04T09:19:09.0159965Z [36;1mset -e[0m
2025-12-04T09:19:09.0160297Z [36;1m# ignore output since only exit code is used for conditional[0m
2025-12-04T09:19:09.0160790Z [36;1m# only pull docker image if it's not available locally[0m
2025-12-04T09:19:09.0161321Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2025-12-04T09:19:09.0161832Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2025-12-04T09:19:09.0162148Z [36;1mfi[0m
2025-12-04T09:19:09.0171071Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:19:09.0171452Z env:
2025-12-04T09:19:09.0171652Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:19:09.0172488Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.0173453Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:09.0173833Z ##[endgroup]
2025-12-04T09:19:09.0204476Z + set +e
2025-12-04T09:19:09.0204767Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:09.0205188Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:09.0209229Z + aws ecr get-login-password --region us-east-1
2025-12-04T09:19:09.0214434Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:19:09.5545405Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:19:09.5546012Z Configure a credential helper to remove this warning. See
2025-12-04T09:19:09.5546591Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:19:09.5546980Z 
2025-12-04T09:19:09.5547488Z Login Succeeded
2025-12-04T09:19:09.5577708Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.5578731Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
2025-12-04T09:19:09.7777658Z + IMAGE_SIZE=15091.581844329834
2025-12-04T09:19:09.7778091Z Compressed size of image in MB: 15091.581844329834
2025-12-04T09:19:09.7778514Z + echo 'Compressed size of image in MB: 15091.581844329834'
2025-12-04T09:19:09.7778881Z + set -e
2025-12-04T09:19:09.7780197Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.7932778Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:09.7934254Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:19:10.0336551Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image
2025-12-04T09:19:10.0339394Z 63e5bc7682b8: Pulling fs layer
2025-12-04T09:19:10.0340129Z 0678d56345c9: Pulling fs layer
2025-12-04T09:19:10.0340728Z 45f5c9ddfce7: Pulling fs layer
2025-12-04T09:19:10.0341062Z 086b1df51ac1: Pulling fs layer
2025-12-04T09:19:10.0341445Z fe8a7b64bf98: Pulling fs layer
2025-12-04T09:19:10.0341782Z 7680723e9a57: Pulling fs layer
2025-12-04T09:19:10.0342169Z 9c5027aeeb4e: Pulling fs layer
2025-12-04T09:19:10.0342557Z 9a5652110360: Pulling fs layer
2025-12-04T09:19:10.0342931Z 375c4427e914: Pulling fs layer
2025-12-04T09:19:10.0343637Z a86faaa7dbdd: Pulling fs layer
2025-12-04T09:19:10.0344003Z fb7848686804: Pulling fs layer
2025-12-04T09:19:10.0344358Z 3541df015cdb: Pulling fs layer
2025-12-04T09:19:10.0344734Z 79dc80f426b2: Pulling fs layer
2025-12-04T09:19:10.0345103Z a13fcc1b90bb: Pulling fs layer
2025-12-04T09:19:10.0345476Z 4f4fb700ef54: Pulling fs layer
2025-12-04T09:19:10.0345842Z 549db4d6c618: Pulling fs layer
2025-12-04T09:19:10.0346194Z 5c63528cb580: Pulling fs layer
2025-12-04T09:19:10.0346560Z 75bd83b989a4: Pulling fs layer
2025-12-04T09:19:10.0346922Z de6e78970f51: Pulling fs layer
2025-12-04T09:19:10.0347275Z e13ed7c7e473: Pulling fs layer
2025-12-04T09:19:10.0347600Z fe8a7b64bf98: Waiting
2025-12-04T09:19:10.0347981Z 6e2949bcb741: Pulling fs layer
2025-12-04T09:19:10.0348391Z 14d69d9aaec7: Pulling fs layer
2025-12-04T09:19:10.0348743Z 7680723e9a57: Waiting
2025-12-04T09:19:10.0349094Z 5c02769dd8e5: Pulling fs layer
2025-12-04T09:19:10.0349470Z 35041ce524ac: Pulling fs layer
2025-12-04T09:19:10.0349841Z 9c5027aeeb4e: Waiting
2025-12-04T09:19:10.0350172Z 2fa92dc5885e: Pulling fs layer
2025-12-04T09:19:10.0350535Z 2b85eafbd92a: Pulling fs layer
2025-12-04T09:19:10.0350909Z ff755a4ddad7: Pulling fs layer
2025-12-04T09:19:10.0351292Z 09eb41bdf42d: Pulling fs layer
2025-12-04T09:19:10.0351622Z 11ede4d59e93: Pulling fs layer
2025-12-04T09:19:10.0351924Z 9a5652110360: Waiting
2025-12-04T09:19:10.0352252Z 1283cd8f801a: Pulling fs layer
2025-12-04T09:19:10.0352605Z 024fa855425f: Pulling fs layer
2025-12-04T09:19:10.0352968Z 303e6747a62e: Pulling fs layer
2025-12-04T09:19:10.0353330Z 3017cdf4838b: Pulling fs layer
2025-12-04T09:19:10.0353695Z 6b6cd1c358e8: Pulling fs layer
2025-12-04T09:19:10.0354034Z a13fcc1b90bb: Waiting
2025-12-04T09:19:10.0354252Z 375c4427e914: Waiting
2025-12-04T09:19:10.0354484Z b2dd04501124: Pulling fs layer
2025-12-04T09:19:10.0354754Z 55adc51fe589: Pulling fs layer
2025-12-04T09:19:10.0355035Z a43ca0e4b837: Pulling fs layer
2025-12-04T09:19:10.0355285Z 086b1df51ac1: Waiting
2025-12-04T09:19:10.0355521Z b7212f17fd14: Pulling fs layer
2025-12-04T09:19:10.0355785Z 083e42cac090: Pulling fs layer
2025-12-04T09:19:10.0356047Z 0a00b784a4aa: Pulling fs layer
2025-12-04T09:19:10.0356294Z 303e6747a62e: Waiting
2025-12-04T09:19:10.0356522Z a86faaa7dbdd: Waiting
2025-12-04T09:19:10.0356744Z fb7848686804: Waiting
2025-12-04T09:19:10.0356960Z 5c63528cb580: Waiting
2025-12-04T09:19:10.0357179Z 3541df015cdb: Waiting
2025-12-04T09:19:10.0357416Z c6173c779f7b: Pulling fs layer
2025-12-04T09:19:10.0357660Z b7212f17fd14: Waiting
2025-12-04T09:19:10.0357891Z ed3d1e3387b9: Pulling fs layer
2025-12-04T09:19:10.0369052Z 79dc80f426b2: Waiting
2025-12-04T09:19:10.0369327Z 4f4fb700ef54: Waiting
2025-12-04T09:19:10.0369557Z 75bd83b989a4: Waiting
2025-12-04T09:19:10.0369773Z 6b6cd1c358e8: Waiting
2025-12-04T09:19:10.0369992Z 549db4d6c618: Waiting
2025-12-04T09:19:10.0370452Z b2dd04501124: Waiting
2025-12-04T09:19:10.0370693Z b29343478586: Pulling fs layer
2025-12-04T09:19:10.0370970Z c6f0520487fb: Pulling fs layer
2025-12-04T09:19:10.0371241Z 148171691cd4: Pulling fs layer
2025-12-04T09:19:10.0371489Z de6e78970f51: Waiting
2025-12-04T09:19:10.0371726Z 2c666d30ed77: Pulling fs layer
2025-12-04T09:19:10.0371983Z e13ed7c7e473: Waiting
2025-12-04T09:19:10.0372216Z 5d8d3a0a98e0: Pulling fs layer
2025-12-04T09:19:10.0372477Z 0a00b784a4aa: Waiting
2025-12-04T09:19:10.0372717Z b06bafce9e81: Pulling fs layer
2025-12-04T09:19:10.0373048Z 2b85eafbd92a: Waiting
2025-12-04T09:19:10.0373293Z 6e2949bcb741: Waiting
2025-12-04T09:19:10.0373679Z 15e0d7e4590d: Pulling fs layer
2025-12-04T09:19:10.0373945Z a514bd1add31: Pulling fs layer
2025-12-04T09:19:10.0374195Z 09eb41bdf42d: Waiting
2025-12-04T09:19:10.0374426Z 11ede4d59e93: Waiting
2025-12-04T09:19:10.0374644Z c6173c779f7b: Waiting
2025-12-04T09:19:10.0374869Z 57b84ee60002: Pulling fs layer
2025-12-04T09:19:10.0375127Z b06bafce9e81: Waiting
2025-12-04T09:19:10.0375370Z 5d8d3a0a98e0: Waiting
2025-12-04T09:19:10.0375640Z 15e0d7e4590d: Waiting
2025-12-04T09:19:10.0375896Z b8babeff6d81: Pulling fs layer
2025-12-04T09:19:10.0376266Z 2c666d30ed77: Waiting
2025-12-04T09:19:10.0376477Z ff755a4ddad7: Waiting
2025-12-04T09:19:10.0376696Z 35041ce524ac: Waiting
2025-12-04T09:19:10.0376922Z 83779ddf6a85: Pulling fs layer
2025-12-04T09:19:10.0377177Z b29343478586: Waiting
2025-12-04T09:19:10.0377394Z 57b84ee60002: Waiting
2025-12-04T09:19:10.0377623Z 8b7620c0d736: Pulling fs layer
2025-12-04T09:19:10.0377868Z 5c02769dd8e5: Waiting
2025-12-04T09:19:10.0378159Z b8babeff6d81: Waiting
2025-12-04T09:19:10.0378490Z 3bcfa090e4ef: Pulling fs layer
2025-12-04T09:19:10.0378842Z eb0504ec4d92: Pulling fs layer
2025-12-04T09:19:10.0379257Z 8b7620c0d736: Waiting
2025-12-04T09:19:10.0379558Z 83779ddf6a85: Waiting
2025-12-04T09:19:10.0379778Z 2fa92dc5885e: Waiting
2025-12-04T09:19:10.0379996Z 3bcfa090e4ef: Waiting
2025-12-04T09:19:10.0380233Z 15d0fec09d7b: Pulling fs layer
2025-12-04T09:19:10.0380498Z 148171691cd4: Waiting
2025-12-04T09:19:10.0380723Z cca81fcc62a9: Pulling fs layer
2025-12-04T09:19:10.0380982Z eb0504ec4d92: Waiting
2025-12-04T09:19:10.0381196Z 083e42cac090: Waiting
2025-12-04T09:19:10.0381412Z c6f0520487fb: Waiting
2025-12-04T09:19:10.0381634Z cca81fcc62a9: Waiting
2025-12-04T09:19:10.0381871Z b0b8f9b5c6ab: Pulling fs layer
2025-12-04T09:19:10.0382129Z a514bd1add31: Waiting
2025-12-04T09:19:10.0382440Z 0606ca4d47a8: Pulling fs layer
2025-12-04T09:19:10.0382748Z 2f80a4e1b3b9: Pulling fs layer
2025-12-04T09:19:10.0383006Z 35c916fb1bd0: Pulling fs layer
2025-12-04T09:19:10.0383276Z 195537b7dafc: Pulling fs layer
2025-12-04T09:19:10.0383523Z 14d69d9aaec7: Waiting
2025-12-04T09:19:10.0383748Z dc454fd3967e: Pulling fs layer
2025-12-04T09:19:10.0384004Z 2f80a4e1b3b9: Waiting
2025-12-04T09:19:10.0384235Z 701b34f115fa: Pulling fs layer
2025-12-04T09:19:10.0384501Z 39cefc00ffed: Pulling fs layer
2025-12-04T09:19:10.0384792Z 6ae51eb61a32: Pulling fs layer
2025-12-04T09:19:10.0385066Z dc454fd3967e: Waiting
2025-12-04T09:19:10.0385285Z 701b34f115fa: Waiting
2025-12-04T09:19:10.0385498Z b0b8f9b5c6ab: Waiting
2025-12-04T09:19:10.0385724Z 39cefc00ffed: Waiting
2025-12-04T09:19:10.0385952Z 1fd5341e66df: Pulling fs layer
2025-12-04T09:19:10.0386201Z ed3d1e3387b9: Waiting
2025-12-04T09:19:10.0386427Z 6ae51eb61a32: Waiting
2025-12-04T09:19:10.0386655Z 72a7c87e35e4: Pulling fs layer
2025-12-04T09:19:10.0386898Z 0606ca4d47a8: Waiting
2025-12-04T09:19:10.0387112Z 1fd5341e66df: Waiting
2025-12-04T09:19:10.0387349Z ec36862ac98e: Pulling fs layer
2025-12-04T09:19:10.0387597Z 35c916fb1bd0: Waiting
2025-12-04T09:19:10.0387836Z 05ddbf246e8a: Pulling fs layer
2025-12-04T09:19:10.0388093Z ec36862ac98e: Waiting
2025-12-04T09:19:10.0388306Z 05ddbf246e8a: Waiting
2025-12-04T09:19:10.0388525Z 195537b7dafc: Waiting
2025-12-04T09:19:10.0388745Z 55adc51fe589: Waiting
2025-12-04T09:19:10.0388953Z 024fa855425f: Waiting
2025-12-04T09:19:10.0389171Z 1283cd8f801a: Waiting
2025-12-04T09:19:10.0389386Z 3017cdf4838b: Waiting
2025-12-04T09:19:10.1112927Z 0678d56345c9: Verifying Checksum
2025-12-04T09:19:10.1113370Z 0678d56345c9: Download complete
2025-12-04T09:19:10.1987221Z 086b1df51ac1: Verifying Checksum
2025-12-04T09:19:10.1987566Z 086b1df51ac1: Download complete
2025-12-04T09:19:10.2709975Z fe8a7b64bf98: Download complete
2025-12-04T09:19:10.3366706Z 7680723e9a57: Verifying Checksum
2025-12-04T09:19:10.3367163Z 7680723e9a57: Download complete
2025-12-04T09:19:10.3991806Z 63e5bc7682b8: Verifying Checksum
2025-12-04T09:19:10.3992288Z 63e5bc7682b8: Download complete
2025-12-04T09:19:10.4140939Z 9c5027aeeb4e: Verifying Checksum
2025-12-04T09:19:10.4154760Z 9c5027aeeb4e: Download complete
2025-12-04T09:19:10.4583590Z 9a5652110360: Download complete
2025-12-04T09:19:10.5312490Z a86faaa7dbdd: Verifying Checksum
2025-12-04T09:19:10.5312836Z a86faaa7dbdd: Download complete
2025-12-04T09:19:10.6046596Z fb7848686804: Download complete
2025-12-04T09:19:10.6751391Z 3541df015cdb: Verifying Checksum
2025-12-04T09:19:10.6751772Z 3541df015cdb: Download complete
2025-12-04T09:19:10.7356489Z 79dc80f426b2: Verifying Checksum
2025-12-04T09:19:10.7356819Z 79dc80f426b2: Download complete
2025-12-04T09:19:11.5660755Z 375c4427e914: Verifying Checksum
2025-12-04T09:19:11.5661220Z 375c4427e914: Download complete
2025-12-04T09:19:11.5748424Z 4f4fb700ef54: Verifying Checksum
2025-12-04T09:19:11.5748760Z 4f4fb700ef54: Download complete
2025-12-04T09:19:11.6316163Z 63e5bc7682b8: Pull complete
2025-12-04T09:19:11.6440279Z 549db4d6c618: Verifying Checksum
2025-12-04T09:19:11.6440612Z 549db4d6c618: Download complete
2025-12-04T09:19:11.6549321Z 0678d56345c9: Pull complete
2025-12-04T09:19:11.7542980Z 5c63528cb580: Download complete
2025-12-04T09:19:11.8643820Z 75bd83b989a4: Verifying Checksum
2025-12-04T09:19:11.8644113Z 75bd83b989a4: Download complete
2025-12-04T09:19:11.9792253Z de6e78970f51: Verifying Checksum
2025-12-04T09:19:11.9792709Z de6e78970f51: Download complete
2025-12-04T09:19:12.0519732Z e13ed7c7e473: Download complete
2025-12-04T09:19:12.1082669Z 6e2949bcb741: Verifying Checksum
2025-12-04T09:19:12.1082968Z 6e2949bcb741: Download complete
2025-12-04T09:19:12.2047661Z 14d69d9aaec7: Verifying Checksum
2025-12-04T09:19:12.2047948Z 14d69d9aaec7: Download complete
2025-12-04T09:19:12.2972458Z 5c02769dd8e5: Verifying Checksum
2025-12-04T09:19:12.2972747Z 5c02769dd8e5: Download complete
2025-12-04T09:19:13.2106194Z 45f5c9ddfce7: Verifying Checksum
2025-12-04T09:19:13.2106528Z 45f5c9ddfce7: Download complete
2025-12-04T09:19:13.2820869Z 2fa92dc5885e: Verifying Checksum
2025-12-04T09:19:13.2821214Z 2fa92dc5885e: Download complete
2025-12-04T09:19:13.6635879Z 2b85eafbd92a: Verifying Checksum
2025-12-04T09:19:13.6636216Z 2b85eafbd92a: Download complete
2025-12-04T09:19:13.7565460Z ff755a4ddad7: Verifying Checksum
2025-12-04T09:19:13.7565923Z ff755a4ddad7: Download complete
2025-12-04T09:19:13.8266209Z 09eb41bdf42d: Verifying Checksum
2025-12-04T09:19:13.8266621Z 09eb41bdf42d: Download complete
2025-12-04T09:19:18.4581359Z 11ede4d59e93: Verifying Checksum
2025-12-04T09:19:18.4581792Z 11ede4d59e93: Download complete
2025-12-04T09:19:18.5360581Z 1283cd8f801a: Verifying Checksum
2025-12-04T09:19:18.5361084Z 1283cd8f801a: Download complete
2025-12-04T09:19:18.6186145Z 024fa855425f: Verifying Checksum
2025-12-04T09:19:18.6186630Z 024fa855425f: Download complete
2025-12-04T09:19:18.7141705Z 303e6747a62e: Download complete
2025-12-04T09:19:18.8105478Z 3017cdf4838b: Download complete
2025-12-04T09:19:19.0509326Z 6b6cd1c358e8: Verifying Checksum
2025-12-04T09:19:19.0509674Z 6b6cd1c358e8: Download complete
2025-12-04T09:19:19.1290624Z b2dd04501124: Verifying Checksum
2025-12-04T09:19:19.1290948Z b2dd04501124: Download complete
2025-12-04T09:19:19.2212522Z 55adc51fe589: Verifying Checksum
2025-12-04T09:19:19.2212871Z 55adc51fe589: Download complete
2025-12-04T09:19:19.2954066Z a43ca0e4b837: Verifying Checksum
2025-12-04T09:19:19.2954500Z a43ca0e4b837: Download complete
2025-12-04T09:19:19.3730794Z b7212f17fd14: Verifying Checksum
2025-12-04T09:19:19.3731610Z b7212f17fd14: Download complete
2025-12-04T09:19:19.4689481Z 083e42cac090: Verifying Checksum
2025-12-04T09:19:19.4689912Z 083e42cac090: Download complete
2025-12-04T09:19:19.5530578Z 0a00b784a4aa: Verifying Checksum
2025-12-04T09:19:19.5530919Z 0a00b784a4aa: Download complete
2025-12-04T09:19:19.6343747Z c6173c779f7b: Download complete
2025-12-04T09:19:21.1438209Z ed3d1e3387b9: Verifying Checksum
2025-12-04T09:19:21.1438557Z ed3d1e3387b9: Download complete
2025-12-04T09:19:21.2234376Z b29343478586: Verifying Checksum
2025-12-04T09:19:21.2234817Z b29343478586: Download complete
2025-12-04T09:19:22.9228479Z 45f5c9ddfce7: Pull complete
2025-12-04T09:19:23.0184336Z 086b1df51ac1: Pull complete
2025-12-04T09:19:23.1142588Z fe8a7b64bf98: Pull complete
2025-12-04T09:19:23.1958554Z 7680723e9a57: Pull complete
2025-12-04T09:19:23.4051959Z 9c5027aeeb4e: Pull complete
2025-12-04T09:19:23.6305179Z 9a5652110360: Pull complete
2025-12-04T09:19:24.4094502Z c6f0520487fb: Verifying Checksum
2025-12-04T09:19:24.4094999Z c6f0520487fb: Download complete
2025-12-04T09:19:26.3061219Z 375c4427e914: Pull complete
2025-12-04T09:19:26.5161542Z a86faaa7dbdd: Pull complete
2025-12-04T09:19:26.7282534Z fb7848686804: Pull complete
2025-12-04T09:19:26.9268436Z 3541df015cdb: Pull complete
2025-12-04T09:19:27.1275372Z 79dc80f426b2: Pull complete
2025-12-04T09:19:42.6380951Z a13fcc1b90bb: Verifying Checksum
2025-12-04T09:19:42.6381416Z a13fcc1b90bb: Download complete
2025-12-04T09:19:42.7411886Z 2c666d30ed77: Verifying Checksum
2025-12-04T09:19:42.7412325Z 2c666d30ed77: Download complete
2025-12-04T09:19:42.8109852Z 5d8d3a0a98e0: Verifying Checksum
2025-12-04T09:19:42.8110565Z 5d8d3a0a98e0: Download complete
2025-12-04T09:19:42.8794555Z b06bafce9e81: Verifying Checksum
2025-12-04T09:19:42.8794958Z b06bafce9e81: Download complete
2025-12-04T09:19:42.9523808Z 15e0d7e4590d: Download complete
2025-12-04T09:19:43.0345089Z a514bd1add31: Verifying Checksum
2025-12-04T09:19:43.0345531Z a514bd1add31: Download complete
2025-12-04T09:19:43.1341843Z 57b84ee60002: Verifying Checksum
2025-12-04T09:19:43.1342224Z 57b84ee60002: Download complete
2025-12-04T09:19:43.2338446Z b8babeff6d81: Verifying Checksum
2025-12-04T09:19:43.2338943Z b8babeff6d81: Download complete
2025-12-04T09:19:43.3023835Z 83779ddf6a85: Verifying Checksum
2025-12-04T09:19:43.3024215Z 83779ddf6a85: Download complete
2025-12-04T09:19:43.3743007Z 8b7620c0d736: Download complete
2025-12-04T09:19:43.4591293Z 3bcfa090e4ef: Verifying Checksum
2025-12-04T09:19:43.4591674Z 3bcfa090e4ef: Download complete
2025-12-04T09:19:43.5421198Z eb0504ec4d92: Download complete
2025-12-04T09:19:43.6280066Z 15d0fec09d7b: Verifying Checksum
2025-12-04T09:19:43.6280502Z 15d0fec09d7b: Download complete
2025-12-04T09:19:43.7248867Z cca81fcc62a9: Verifying Checksum
2025-12-04T09:19:43.7249208Z cca81fcc62a9: Download complete
2025-12-04T09:19:43.8131312Z b0b8f9b5c6ab: Verifying Checksum
2025-12-04T09:19:43.8132374Z b0b8f9b5c6ab: Download complete
2025-12-04T09:19:43.8986912Z 0606ca4d47a8: Download complete
2025-12-04T09:19:43.9769337Z 2f80a4e1b3b9: Verifying Checksum
2025-12-04T09:19:43.9769653Z 2f80a4e1b3b9: Download complete
2025-12-04T09:19:44.0347513Z 35c916fb1bd0: Verifying Checksum
2025-12-04T09:19:44.0347864Z 35c916fb1bd0: Download complete
2025-12-04T09:19:46.0474952Z 195537b7dafc: Verifying Checksum
2025-12-04T09:19:46.0475285Z 195537b7dafc: Download complete
2025-12-04T09:19:46.1293413Z dc454fd3967e: Download complete
2025-12-04T09:19:46.2151110Z 701b34f115fa: Verifying Checksum
2025-12-04T09:19:46.2151551Z 701b34f115fa: Download complete
2025-12-04T09:19:46.2858922Z 39cefc00ffed: Download complete
2025-12-04T09:19:46.3715397Z 6ae51eb61a32: Verifying Checksum
2025-12-04T09:19:46.3715735Z 6ae51eb61a32: Download complete
2025-12-04T09:19:46.4663726Z 1fd5341e66df: Verifying Checksum
2025-12-04T09:19:46.4664082Z 1fd5341e66df: Download complete
2025-12-04T09:19:46.6602909Z 72a7c87e35e4: Verifying Checksum
2025-12-04T09:19:46.6603342Z 72a7c87e35e4: Download complete
2025-12-04T09:19:46.7247407Z ec36862ac98e: Download complete
2025-12-04T09:19:47.3177908Z 05ddbf246e8a: Verifying Checksum
2025-12-04T09:19:47.3178863Z 05ddbf246e8a: Download complete
2025-12-04T09:19:55.0321359Z 148171691cd4: Verifying Checksum
2025-12-04T09:19:55.0321713Z 148171691cd4: Download complete
2025-12-04T09:20:31.5538197Z 35041ce524ac: Verifying Checksum
2025-12-04T09:20:31.5538623Z 35041ce524ac: Download complete
2025-12-04T09:21:06.1079227Z a13fcc1b90bb: Pull complete
2025-12-04T09:21:06.3286269Z 4f4fb700ef54: Pull complete
2025-12-04T09:21:06.5405864Z 549db4d6c618: Pull complete
2025-12-04T09:21:06.8044012Z 5c63528cb580: Pull complete
2025-12-04T09:21:07.0202322Z 75bd83b989a4: Pull complete
2025-12-04T09:21:07.3236364Z de6e78970f51: Pull complete
2025-12-04T09:21:07.4639602Z e13ed7c7e473: Pull complete
2025-12-04T09:21:07.5994021Z 6e2949bcb741: Pull complete
2025-12-04T09:21:07.6775196Z 14d69d9aaec7: Pull complete
2025-12-04T09:21:07.8481633Z 5c02769dd8e5: Pull complete
2025-12-04T09:22:39.2137886Z 35041ce524ac: Pull complete
2025-12-04T09:22:39.4206554Z 2fa92dc5885e: Pull complete
2025-12-04T09:22:40.2673255Z 2b85eafbd92a: Pull complete
2025-12-04T09:22:40.4770511Z ff755a4ddad7: Pull complete
2025-12-04T09:22:40.6812386Z 09eb41bdf42d: Pull complete
2025-12-04T09:22:49.4302225Z 11ede4d59e93: Pull complete
2025-12-04T09:22:49.6407347Z 1283cd8f801a: Pull complete
2025-12-04T09:22:49.8454793Z 024fa855425f: Pull complete
2025-12-04T09:22:50.2823010Z 303e6747a62e: Pull complete
2025-12-04T09:22:50.4962300Z 3017cdf4838b: Pull complete
2025-12-04T09:22:50.9099480Z 6b6cd1c358e8: Pull complete
2025-12-04T09:22:51.1191903Z b2dd04501124: Pull complete
2025-12-04T09:22:51.3277095Z 55adc51fe589: Pull complete
2025-12-04T09:22:51.7673707Z a43ca0e4b837: Pull complete
2025-12-04T09:22:51.9938981Z b7212f17fd14: Pull complete
2025-12-04T09:22:52.2073946Z 083e42cac090: Pull complete
2025-12-04T09:22:52.6481425Z 0a00b784a4aa: Pull complete
2025-12-04T09:22:52.8653805Z c6173c779f7b: Pull complete
2025-12-04T09:22:56.6709554Z ed3d1e3387b9: Pull complete
2025-12-04T09:22:56.8891227Z b29343478586: Pull complete
2025-12-04T09:22:58.3145897Z c6f0520487fb: Pull complete
2025-12-04T09:23:59.5182792Z 148171691cd4: Pull complete
2025-12-04T09:23:59.5927006Z 2c666d30ed77: Pull complete
2025-12-04T09:23:59.7234329Z 5d8d3a0a98e0: Pull complete
2025-12-04T09:23:59.9560280Z b06bafce9e81: Pull complete
2025-12-04T09:24:00.4018162Z 15e0d7e4590d: Pull complete
2025-12-04T09:24:00.5988535Z a514bd1add31: Pull complete
2025-12-04T09:24:01.0067901Z 57b84ee60002: Pull complete
2025-12-04T09:24:01.4286387Z b8babeff6d81: Pull complete
2025-12-04T09:24:01.6440834Z 83779ddf6a85: Pull complete
2025-12-04T09:24:02.0596479Z 8b7620c0d736: Pull complete
2025-12-04T09:24:02.3944507Z 3bcfa090e4ef: Pull complete
2025-12-04T09:24:02.5363985Z eb0504ec4d92: Pull complete
2025-12-04T09:24:02.7512373Z 15d0fec09d7b: Pull complete
2025-12-04T09:24:02.8952034Z cca81fcc62a9: Pull complete
2025-12-04T09:24:03.1778494Z b0b8f9b5c6ab: Pull complete
2025-12-04T09:24:03.3862628Z 0606ca4d47a8: Pull complete
2025-12-04T09:24:03.5430395Z 2f80a4e1b3b9: Pull complete
2025-12-04T09:24:03.5783340Z 35c916fb1bd0: Pull complete
2025-12-04T09:24:10.4119673Z 195537b7dafc: Pull complete
2025-12-04T09:24:10.6210457Z dc454fd3967e: Pull complete
2025-12-04T09:24:10.8358793Z 701b34f115fa: Pull complete
2025-12-04T09:24:11.0581011Z 39cefc00ffed: Pull complete
2025-12-04T09:24:11.2286198Z 6ae51eb61a32: Pull complete
2025-12-04T09:24:11.3612991Z 1fd5341e66df: Pull complete
2025-12-04T09:24:13.2011268Z 72a7c87e35e4: Pull complete
2025-12-04T09:24:13.4104513Z ec36862ac98e: Pull complete
2025-12-04T09:24:15.2256258Z 05ddbf246e8a: Pull complete
2025-12-04T09:24:15.5764513Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97
2025-12-04T09:24:15.6187607Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:24:15.6371344Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:24:15.6455273Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:24:15.6456438Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:24:15.6467484Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:24:15.6467872Z env:
2025-12-04T09:24:15.6468085Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:24:15.6468348Z ##[endgroup]
2025-12-04T09:24:15.6666918Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main
2025-12-04T09:24:15.6667334Z with:
2025-12-04T09:24:15.6667553Z   driver-version: 580.82.07
2025-12-04T09:24:15.6667802Z env:
2025-12-04T09:24:15.6668008Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:24:15.6668267Z ##[endgroup]
2025-12-04T09:24:15.6786284Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:24:15.6787418Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:24:15.6796644Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:24:15.6797007Z env:
2025-12-04T09:24:15.6797210Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:24:15.6797446Z ##[endgroup]
2025-12-04T09:24:15.6877624Z ##[group]Run set -euo pipefail
2025-12-04T09:24:15.6877973Z [36;1mset -euo pipefail[0m
2025-12-04T09:24:15.6878275Z [36;1m[0m
2025-12-04T09:24:15.6878471Z [36;1mhas_gpu=false[0m
2025-12-04T09:24:15.6878715Z [36;1mdevices=""[0m
2025-12-04T09:24:15.6878932Z [36;1m[0m
2025-12-04T09:24:15.6879189Z [36;1mif command -v nvidia-smi >/dev/null 2>&1; then[0m
2025-12-04T09:24:15.6879631Z [36;1m  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:24:15.6880009Z [36;1m    has_gpu=true[0m
2025-12-04T09:24:15.6880292Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:24:15.6880595Z [36;1m  fi[0m
2025-12-04T09:24:15.6880803Z [36;1mfi[0m
2025-12-04T09:24:15.6880992Z [36;1m[0m
2025-12-04T09:24:15.6881202Z [36;1mif [ "$has_gpu" = false ]; then[0m
2025-12-04T09:24:15.6881592Z [36;1m  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:24:15.6881967Z [36;1m    has_gpu=true[0m
2025-12-04T09:24:15.6882249Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:24:15.6882549Z [36;1m  fi[0m
2025-12-04T09:24:15.6882745Z [36;1mfi[0m
2025-12-04T09:24:15.6882943Z [36;1m[0m
2025-12-04T09:24:15.6883240Z [36;1mif [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then[0m
2025-12-04T09:24:15.6883741Z [36;1m  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:24:15.6884147Z [36;1m    has_gpu=true[0m
2025-12-04T09:24:15.6884422Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:24:15.6884727Z [36;1m  fi[0m
2025-12-04T09:24:15.6884922Z [36;1mfi[0m
2025-12-04T09:24:15.6885111Z [36;1m[0m
2025-12-04T09:24:15.6885411Z [36;1mprintf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:24:15.6885944Z [36;1mprintf 'DETECTED_DEVICES<<EOF\n%s\nEOF\n' "$devices" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:24:15.6894381Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:24:15.6894737Z env:
2025-12-04T09:24:15.6894941Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:24:15.6895188Z ##[endgroup]
2025-12-04T09:24:17.4378890Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then
2025-12-04T09:24:17.4379625Z [36;1mif [ "${HAS_NVIDIA}" = "true" ]; then[0m
2025-12-04T09:24:17.4380193Z [36;1m  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}"[0m
2025-12-04T09:24:17.4380954Z [36;1m  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}"[0m
2025-12-04T09:24:17.4381642Z [36;1melse[0m
2025-12-04T09:24:17.4382052Z [36;1m  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}"[0m
2025-12-04T09:24:17.4382577Z [36;1mfi[0m
2025-12-04T09:24:17.4395396Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:24:17.4395976Z env:
2025-12-04T09:24:17.4396295Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:24:17.4396924Z   HAS_NVIDIA: true
2025-12-04T09:24:17.4397252Z ##[endgroup]
2025-12-04T09:24:17.4482662Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2025-12-04T09:24:17.4483084Z with:
2025-12-04T09:24:17.4483280Z   timeout_minutes: 10
2025-12-04T09:24:17.4483518Z   max_attempts: 3
2025-12-04T09:24:17.4512320Z   command: # Is it disgusting to have a full shell script here in this github action? Sure
# But is it the best way to make it so that this action relies on nothing else? Absolutely
set -eou pipefail

DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"

install_nvidia_docker2_amzn2() {
    (
        set -x
        # Needed for yum-config-manager
        sudo yum install -y yum-utils
        if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then
          YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo"
        else
          # Amazon Linux 2
          YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
        fi

        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
        sudo yum install -y \
          nvidia-container-toolkit-1.17.8 \
          libnvidia-container-tools-1.17.8 \
          libnvidia-container1-1.17.8 \
          nvidia-container-toolkit-base-1.17.8
        sudo systemctl restart docker
    )
}

install_nvidia_docker2_ubuntu20() {
    (
        set -x
        # Install nvidia-driver package if not installed
        status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)"
        if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
          sudo apt-get install -y nvidia-container-toolkit-1.17.8
          sudo systemctl restart docker
        fi
    )
}

pre_install_nvidia_driver_amzn2() {
    (
        # Purge any nvidia driver installed from RHEL repo
        sudo yum remove -y nvidia-driver-latest-dkms
    )
}

install_nvidia_driver_common() {
    (
        # Try to gather more information about the runner and its existing NVIDIA driver if any
        echo "Before installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        HAS_NVIDIA_DRIVER=0
        # Check if NVIDIA driver has already been installed
        if [ -x "$(command -v nvidia-smi)" ]; then
            set +e
            # The driver exists, check its version next. Also check only the first GPU if there are more than one of them
            # so that the same driver version is not print over multiple lines
            INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
            NVIDIA_SMI_STATUS=$?

            if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing"
            elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing"

                # Turn off persistent mode so that the installation script can unload the kernel module
                sudo killall nvidia-persistenced || true
            else
                HAS_NVIDIA_DRIVER=1
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
            fi
            set -e
        fi

        if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
            # CAUTION: this may need to be updated in future
            if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then
                  sudo yum groupinstall -y "Development Tools"
                  # ensure our kernel install is the same as our underlying kernel,
                  # groupinstall "Development Tools" has a habit of mismatching kernel headers
                  sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
                  sudo modprobe backlight
            fi
            sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

            set +e
            sudo /bin/bash /tmp/nvidia_driver -s --no-drm
            NVIDIA_INSTALLATION_STATUS=$?

            RESET_GPU=0
            if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then
                sudo cat /var/log/nvidia-installer.log
                # Fail to install NVIDIA driver, try to reset the GPU
                RESET_GPU=1
            elif [ -x "$(command -v nvidia-smi)" ]; then
                # Check again if nvidia-smi works even if the driver installation completes successfully
                INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
                NVIDIA_SMI_STATUS=$?

                if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                    RESET_GPU=1
                fi
            fi

            if [ "$RESET_GPU" -eq 1 ]; then
                NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1)
                # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this
                # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388
                for PCI_ID in $NVIDIA_DEVICES; do
                    DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable)

                    echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)"
                    # This requires sudo permission of course
                    echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset
                    sleep 1
                done
            fi

            sudo rm -fv /tmp/nvidia_driver
            set -e
        fi
    )
}

post_install_nvidia_driver_common() {
    (
        sudo modprobe nvidia || true
        echo "After installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        (
            set +e

            nvidia-smi
            # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in
            # the case where the driver has already crashed as it still can get the driver version
            # and some basic information like the bus ID.  However, the rest of the information
            # would be missing (ERR!), for example:
            #
            # +-----------------------------------------------------------------------------+
            # | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
            # |-------------------------------+----------------------+----------------------+
            # | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
            # | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
            # |                               |                      |               MIG M. |
            # |===============================+======================+======================|
            # |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |
            # |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |
            # |                               |                      |                 ERR! |
            # +-------------------------------+----------------------+----------------------+
            #
            # +-----------------------------------------------------------------------------+
            # | Processes:                                                                  |
            # |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
            # |        ID   ID                                                   Usage      |
            # |=============================================================================|
            # +-----------------------------------------------------------------------------+
            #
            # This should be reported as a failure instead as it will guarantee to fail when
            # Docker tries to run with --gpus all
            #
            # So, the correct check here is to query one of the missing piece of info like
            # GPU name, so that the command can fail accordingly
            nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
            NVIDIA_SMI_STATUS=$?

            # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285
            if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then
                echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}"
            else
                echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}"
                exit ${NVIDIA_SMI_STATUS}
            fi
            set -e
        )
    )
}

install_nvidia_driver_amzn2() {
    (
        set -x
        pre_install_nvidia_driver_amzn2
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

install_nvidia_driver_ubuntu20() {
    (
        set -x
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_driver_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_driver_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_docker2_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_docker2_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with
# more than one GPUs. This just needs to be run once. The command fails
# on subsequent runs and complains that the mode is already on, but that's
# ok
sudo nvidia-persistenced || true
# This should show persistence mode ON
nvidia-smi

# check if the container-toolkit is correctly installed and CUDA is available inside a container
docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi

2025-12-04T09:24:17.4540822Z   retry_wait_seconds: 10
2025-12-04T09:24:17.4541088Z   polling_interval_seconds: 1
2025-12-04T09:24:17.4541357Z   warning_on_retry: true
2025-12-04T09:24:17.4541627Z   continue_on_error: false
2025-12-04T09:24:17.4541872Z env:
2025-12-04T09:24:17.4542073Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:24:17.4542332Z   HAS_NVIDIA_GPU: true
2025-12-04T09:24:17.4542639Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:24:17.4543000Z   DRIVER_VERSION: 580.82.07
2025-12-04T09:24:17.4543261Z ##[endgroup]
2025-12-04T09:24:17.5743601Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run ==
2025-12-04T09:24:17.5744459Z + pre_install_nvidia_driver_amzn2
2025-12-04T09:24:17.5746740Z + sudo yum remove -y nvidia-driver-latest-dkms
2025-12-04T09:24:18.2276428Z No match for argument: nvidia-driver-latest-dkms
2025-12-04T09:24:18.2276831Z No packages marked for removal.
2025-12-04T09:24:18.2342061Z Dependencies resolved.
2025-12-04T09:24:18.2352016Z Nothing to do.
2025-12-04T09:24:18.2352670Z Complete!
2025-12-04T09:24:18.2994144Z + install_nvidia_driver_common
2025-12-04T09:24:18.2998503Z + echo 'Before installing NVIDIA driver'
2025-12-04T09:24:18.2999153Z + lspci
2025-12-04T09:24:18.3000701Z Before installing NVIDIA driver
2025-12-04T09:24:18.4340337Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:24:18.4341017Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:24:18.4341613Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:24:18.4342180Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:24:18.4342680Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:24:18.4343269Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:24:18.4343784Z 00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1)
2025-12-04T09:24:18.4344332Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:24:18.4344772Z + lsmod
2025-12-04T09:24:18.4398025Z Module                  Size  Used by
2025-12-04T09:24:18.4398805Z nvidia_uvm           1925120  0
2025-12-04T09:24:18.4399100Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:24:18.4399391Z drm                   602112  1 nvidia
2025-12-04T09:24:18.4399705Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:24:18.4400108Z backlight              24576  1 drm
2025-12-04T09:24:18.4400495Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:24:18.4400862Z xt_conntrack           16384  1
2025-12-04T09:24:18.4401128Z nft_chain_nat          16384  3
2025-12-04T09:24:18.4401664Z xt_MASQUERADE          20480  1
2025-12-04T09:24:18.4402073Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:24:18.4402524Z nf_conntrack_netlink    57344  0
2025-12-04T09:24:18.4403039Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:24:18.4403498Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:24:18.4403816Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:24:18.4404120Z xfrm_user              57344  1
2025-12-04T09:24:18.4404379Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:24:18.4404672Z xt_addrtype            16384  2
2025-12-04T09:24:18.4404930Z nft_compat             20480  4
2025-12-04T09:24:18.4405232Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:24:18.4405669Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:24:18.4406066Z br_netfilter           36864  0
2025-12-04T09:24:18.4406337Z bridge                323584  1 br_netfilter
2025-12-04T09:24:18.4406647Z stp                    16384  1 bridge
2025-12-04T09:24:18.4406939Z llc                    16384  2 bridge,stp
2025-12-04T09:24:18.4407226Z overlay               167936  0
2025-12-04T09:24:18.4407473Z tls                   139264  0
2025-12-04T09:24:18.4407981Z nls_ascii              16384  1
2025-12-04T09:24:18.4408255Z nls_cp437              20480  1
2025-12-04T09:24:18.4408495Z vfat                   24576  1
2025-12-04T09:24:18.4408774Z fat                    86016  1 vfat
2025-12-04T09:24:18.4409079Z sunrpc                700416  1
2025-12-04T09:24:18.4409318Z i8042                  45056  0
2025-12-04T09:24:18.4409574Z ghash_clmulni_intel    16384  0
2025-12-04T09:24:18.4409836Z serio                  28672  3 i8042
2025-12-04T09:24:18.4410097Z ena                   184320  0
2025-12-04T09:24:18.4410346Z button                 24576  0
2025-12-04T09:24:18.4410597Z sch_fq_codel           20480  17
2025-12-04T09:24:18.4410843Z fuse                  184320  1
2025-12-04T09:24:18.4411092Z loop                   36864  0
2025-12-04T09:24:18.4411339Z dm_mod                188416  0
2025-12-04T09:24:18.4411586Z configfs               57344  1
2025-12-04T09:24:18.4411831Z dmi_sysfs              20480  0
2025-12-04T09:24:18.4412082Z crc32_pclmul           16384  0
2025-12-04T09:24:18.4412332Z crc32c_intel           24576  0
2025-12-04T09:24:18.4412573Z efivarfs               24576  1
2025-12-04T09:24:18.4412827Z + modinfo nvidia
2025-12-04T09:24:18.4420470Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:24:18.4421139Z import_ns:      DMA_BUF
2025-12-04T09:24:18.4421460Z alias:          char-major-195-*
2025-12-04T09:24:18.4421794Z version:        580.82.07
2025-12-04T09:24:18.4422036Z supported:      external
2025-12-04T09:24:18.4422275Z license:        Dual MIT/GPL
2025-12-04T09:24:18.4422567Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:24:18.4423027Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:24:18.4423475Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:24:18.4423916Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:24:18.4424279Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:24:18.4424632Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:24:18.4424980Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:24:18.4425329Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:24:18.4425859Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:24:18.4426198Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:24:18.4426604Z depends:        i2c-core,drm
2025-12-04T09:24:18.4426954Z retpoline:      Y
2025-12-04T09:24:18.4427232Z name:           nvidia
2025-12-04T09:24:18.4427682Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:24:18.4428187Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:24:18.4428682Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:24:18.4429280Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:24:18.4429599Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:24:18.4429976Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:24:18.4430406Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:24:18.4430818Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:24:18.4431231Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:24:18.4431625Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:24:18.4432029Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:24:18.4432370Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:24:18.4432676Z parm:           NVreg_EnableMSI:int
2025-12-04T09:24:18.4432981Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:24:18.4433359Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:24:18.4433895Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:24:18.4434416Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:24:18.4434879Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:24:18.4435302Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:24:18.4435735Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:24:18.4436157Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:24:18.4436507Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:24:18.4436889Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:24:18.4437268Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:24:18.4437617Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:24:18.4437946Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:24:18.4438283Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:24:18.4438637Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:24:18.4438982Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:24:18.4439339Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:24:18.4439705Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:24:18.4440074Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:24:18.4440444Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:24:18.4440781Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:24:18.4441133Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:24:18.4441502Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:24:18.4441856Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:24:18.4442204Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:24:18.4442552Z parm:           NVreg_RmMsg:charp
2025-12-04T09:24:18.4442845Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:24:18.4443169Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:24:18.4443503Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:24:18.4454510Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:24:18.4454918Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:24:18.4455298Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:24:18.4455661Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:24:18.4456001Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:24:18.4456359Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:24:18.4456705Z parm:           rm_firmware_active:charp
2025-12-04T09:24:18.4457139Z + HAS_NVIDIA_DRIVER=0
2025-12-04T09:24:18.4457387Z ++ command -v nvidia-smi
2025-12-04T09:24:18.4457641Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:24:18.4457898Z + set +e
2025-12-04T09:24:18.4458216Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:24:20.1761839Z + INSTALLED_DRIVER_VERSION=580.82.07
2025-12-04T09:24:20.1762323Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:24:20.1762644Z + '[' 0 -ne 0 ']'
2025-12-04T09:24:20.1762956Z + '[' 580.82.07 '!=' 580.82.07 ']'
2025-12-04T09:24:20.1763686Z + HAS_NVIDIA_DRIVER=1
2025-12-04T09:24:20.1764229Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation'
2025-12-04T09:24:20.1764730Z + set -e
2025-12-04T09:24:20.1764917Z + '[' 1 -eq 0 ']'
2025-12-04T09:24:20.1765326Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation
2025-12-04T09:24:20.1767265Z + post_install_nvidia_driver_common
2025-12-04T09:24:20.1770591Z + sudo modprobe nvidia
2025-12-04T09:24:20.3014714Z + echo 'After installing NVIDIA driver'
2025-12-04T09:24:20.3015185Z + lspci
2025-12-04T09:24:20.3015415Z After installing NVIDIA driver
2025-12-04T09:24:20.3139636Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:24:20.3140790Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:24:20.3141860Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:24:20.3142875Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:24:20.3143905Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:24:20.3144942Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:24:20.3146192Z 00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1)
2025-12-04T09:24:20.3147106Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:24:20.3147874Z + lsmod
2025-12-04T09:24:20.3184674Z Module                  Size  Used by
2025-12-04T09:24:20.3185081Z nvidia_uvm           1925120  0
2025-12-04T09:24:20.3185439Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:24:20.3185846Z drm                   602112  1 nvidia
2025-12-04T09:24:20.3186236Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:24:20.3186554Z backlight              24576  1 drm
2025-12-04T09:24:20.3186889Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:24:20.3187294Z xt_conntrack           16384  1
2025-12-04T09:24:20.3187654Z nft_chain_nat          16384  3
2025-12-04T09:24:20.3188009Z xt_MASQUERADE          20480  1
2025-12-04T09:24:20.3188418Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:24:20.3188781Z nf_conntrack_netlink    57344  0
2025-12-04T09:24:20.3189196Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:24:20.3189665Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:24:20.3189992Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:24:20.3190287Z xfrm_user              57344  1
2025-12-04T09:24:20.3190556Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:24:20.3190852Z xt_addrtype            16384  2
2025-12-04T09:24:20.3191108Z nft_compat             20480  4
2025-12-04T09:24:20.3191418Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:24:20.3191857Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:24:20.3192251Z br_netfilter           36864  0
2025-12-04T09:24:20.3192530Z bridge                323584  1 br_netfilter
2025-12-04T09:24:20.3192834Z stp                    16384  1 bridge
2025-12-04T09:24:20.3193128Z llc                    16384  2 bridge,stp
2025-12-04T09:24:20.3193423Z overlay               167936  0
2025-12-04T09:24:20.3193671Z tls                   139264  0
2025-12-04T09:24:20.3193919Z nls_ascii              16384  1
2025-12-04T09:24:20.3194192Z nls_cp437              20480  1
2025-12-04T09:24:20.3194722Z vfat                   24576  1
2025-12-04T09:24:20.3194988Z fat                    86016  1 vfat
2025-12-04T09:24:20.3195264Z sunrpc                700416  1
2025-12-04T09:24:20.3195504Z i8042                  45056  0
2025-12-04T09:24:20.3195757Z ghash_clmulni_intel    16384  0
2025-12-04T09:24:20.3196019Z serio                  28672  3 i8042
2025-12-04T09:24:20.3196290Z ena                   184320  0
2025-12-04T09:24:20.3196529Z button                 24576  0
2025-12-04T09:24:20.3196780Z sch_fq_codel           20480  17
2025-12-04T09:24:20.3197191Z fuse                  184320  1
2025-12-04T09:24:20.3197427Z loop                   36864  0
2025-12-04T09:24:20.3197675Z dm_mod                188416  0
2025-12-04T09:24:20.3197922Z configfs               57344  1
2025-12-04T09:24:20.3198162Z dmi_sysfs              20480  0
2025-12-04T09:24:20.3198416Z crc32_pclmul           16384  0
2025-12-04T09:24:20.3198671Z crc32c_intel           24576  0
2025-12-04T09:24:20.3198920Z efivarfs               24576  1
2025-12-04T09:24:20.3199203Z + modinfo nvidia
2025-12-04T09:24:20.3204365Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:24:20.3205034Z import_ns:      DMA_BUF
2025-12-04T09:24:20.3205363Z alias:          char-major-195-*
2025-12-04T09:24:20.3205667Z version:        580.82.07
2025-12-04T09:24:20.3205909Z supported:      external
2025-12-04T09:24:20.3206154Z license:        Dual MIT/GPL
2025-12-04T09:24:20.3206432Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:24:20.3206786Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:24:20.3207115Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:24:20.3207442Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:24:20.3208047Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:24:20.3208406Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:24:20.3208756Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:24:20.3209127Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:24:20.3209521Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:24:20.3209984Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:24:20.3210403Z depends:        i2c-core,drm
2025-12-04T09:24:20.3210740Z retpoline:      Y
2025-12-04T09:24:20.3211032Z name:           nvidia
2025-12-04T09:24:20.3211418Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:24:20.3211917Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:24:20.3212393Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:24:20.3212829Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:24:20.3213147Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:24:20.3213456Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:24:20.3213775Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:24:20.3214086Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:24:20.3214403Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:24:20.3214775Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:24:20.3215171Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:24:20.3215515Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:24:20.3215825Z parm:           NVreg_EnableMSI:int
2025-12-04T09:24:20.3216128Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:24:20.3216508Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:24:20.3216929Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:24:20.3217317Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:24:20.3217751Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:24:20.3218176Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:24:20.3218609Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:24:20.3219302Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:24:20.3219651Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:24:20.3220030Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:24:20.3220411Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:24:20.3220761Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:24:20.3221086Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:24:20.3221419Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:24:20.3221749Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:24:20.3222182Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:24:20.3222536Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:24:20.3222901Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:24:20.3223263Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:24:20.3223635Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:24:20.3223972Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:24:20.3224328Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:24:20.3224693Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:24:20.3225039Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:24:20.3225384Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:24:20.3225723Z parm:           NVreg_RmMsg:charp
2025-12-04T09:24:20.3226012Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:24:20.3226341Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:24:20.3226670Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:24:20.3226998Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:24:20.3227328Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:24:20.3227696Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:24:20.3228055Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:24:20.3228383Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:24:20.3228740Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:24:20.3229119Z parm:           rm_firmware_active:charp
2025-12-04T09:24:20.3229430Z + set +e
2025-12-04T09:24:20.3229619Z + nvidia-smi
2025-12-04T09:24:21.7741306Z Thu Dec  4 09:24:21 2025       
2025-12-04T09:24:21.7741726Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:24:21.7742258Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:24:21.7742761Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:24:21.7743317Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:24:21.7743883Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:24:21.7744336Z |                                         |                        |               MIG M. |
2025-12-04T09:24:21.7744692Z |=========================================+========================+======================|
2025-12-04T09:24:21.7830489Z |   0  NVIDIA A10G                    Off |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:24:21.7830990Z |  0%   25C    P0             59W /  300W |       0MiB /  23028MiB |      4%      Default |
2025-12-04T09:24:21.7831484Z |                                         |                        |                  N/A |
2025-12-04T09:24:21.7831893Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:24:21.7832621Z 
2025-12-04T09:24:21.7832802Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:24:21.7833373Z | Processes:                                                                              |
2025-12-04T09:24:21.7833848Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:24:21.7834280Z |        ID   ID                                                               Usage      |
2025-12-04T09:24:21.7834951Z |=========================================================================================|
2025-12-04T09:24:21.7836529Z |  No running processes found                                                             |
2025-12-04T09:24:21.7837151Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:24:22.2072746Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T09:24:23.6606009Z NVIDIA A10G
2025-12-04T09:24:23.9341735Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:24:23.9342037Z + '[' 0 -eq 0 ']'
2025-12-04T09:24:23.9342274Z + echo 'INFO: Ignoring allowed status 0'
2025-12-04T09:24:23.9342571Z + set -e
2025-12-04T09:24:23.9342777Z INFO: Ignoring allowed status 0
2025-12-04T09:24:23.9351362Z == Installing nvidia container toolkit for amzn2023 ==
2025-12-04T09:24:23.9355343Z + sudo yum install -y yum-utils
2025-12-04T09:24:24.3819456Z Last metadata expiration check: 0:08:49 ago on Thu Dec  4 09:15:35 2025.
2025-12-04T09:24:24.4088182Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed.
2025-12-04T09:24:24.4671727Z Dependencies resolved.
2025-12-04T09:24:24.4965010Z Nothing to do.
2025-12-04T09:24:24.4965400Z Complete!
2025-12-04T09:24:24.5974731Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]]
2025-12-04T09:24:24.5975333Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:24:24.5976277Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:24:24.8822650Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:24:24.9367664Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8
2025-12-04T09:24:25.4654483Z nvidia-container-toolkit                         18 kB/s | 833  B     00:00    
2025-12-04T09:24:25.5493965Z Dependencies resolved.
2025-12-04T09:24:25.5781378Z ================================================================================
2025-12-04T09:24:25.5781874Z  Package                       Arch   Version    Repository                Size
2025-12-04T09:24:25.5782301Z ================================================================================
2025-12-04T09:24:25.5782635Z Downgrading:
2025-12-04T09:24:25.5783046Z  libnvidia-container-tools     x86_64 1.17.8-1   nvidia-container-toolkit  40 k
2025-12-04T09:24:25.5783662Z  libnvidia-container1          x86_64 1.17.8-1   nvidia-container-toolkit 1.0 M
2025-12-04T09:24:25.5784268Z  nvidia-container-toolkit      x86_64 1.17.8-1   nvidia-container-toolkit 1.2 M
2025-12-04T09:24:25.5784902Z  nvidia-container-toolkit-base x86_64 1.17.8-1   nvidia-container-toolkit 5.8 M
2025-12-04T09:24:25.5785289Z 
2025-12-04T09:24:25.5785386Z Transaction Summary
2025-12-04T09:24:25.5785630Z ================================================================================
2025-12-04T09:24:25.5785966Z Downgrade  4 Packages
2025-12-04T09:24:25.5786117Z 
2025-12-04T09:24:25.5786227Z Total download size: 8.0 M
2025-12-04T09:24:25.5786889Z Downloading Packages:
2025-12-04T09:24:25.6226233Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 951 kB/s |  40 kB     00:00    
2025-12-04T09:24:25.6667973Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm  11 MB/s | 1.0 MB     00:00    
2025-12-04T09:24:25.7178611Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 9.0 MB/s | 1.2 MB     00:00    
2025-12-04T09:24:25.8462309Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x  26 MB/s | 5.8 MB     00:00    
2025-12-04T09:24:25.8471942Z --------------------------------------------------------------------------------
2025-12-04T09:24:25.8474866Z Total                                            30 MB/s | 8.0 MB     00:00     
2025-12-04T09:24:25.8477753Z Running transaction check
2025-12-04T09:24:25.8597047Z Transaction check succeeded.
2025-12-04T09:24:25.8597622Z Running transaction test
2025-12-04T09:24:25.9101319Z Transaction test succeeded.
2025-12-04T09:24:25.9104582Z Running transaction
2025-12-04T09:24:26.7566295Z   Preparing        :                                                        1/1 
2025-12-04T09:24:26.8884432Z   Downgrading      : nvidia-container-toolkit-base-1.17.8-1.x86_64          1/8 
2025-12-04T09:24:26.9142871Z   Downgrading      : libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:24:26.9952998Z   Running scriptlet: libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:24:27.1282295Z   Downgrading      : libnvidia-container-tools-1.17.8-1.x86_64              3/8 
2025-12-04T09:24:27.1589281Z   Downgrading      : nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:24:27.2154121Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:24:27.2229654Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:24:27.2230276Z   Cleanup          : nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:24:27.2599947Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:24:27.2664651Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:24:27.2665264Z   Cleanup          : libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:24:27.3046373Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:24:27.3128728Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:24:27.3129348Z   Cleanup          : libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:24:27.3532863Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:24:27.3610370Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:24:27.3611359Z   Cleanup          : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:24:27.4005285Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:24:27.4690526Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               8/8 
2025-12-04T09:24:51.3335602Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:24:51.3338100Z   Verifying        : libnvidia-container-tools-1.17.8-1.x86_64              1/8 
2025-12-04T09:24:51.3339302Z   Verifying        : libnvidia-container-tools-1.18.1-1.x86_64              2/8 
2025-12-04T09:24:51.3340030Z   Verifying        : libnvidia-container1-1.17.8-1.x86_64                   3/8 
2025-12-04T09:24:51.3340673Z   Verifying        : libnvidia-container1-1.18.1-1.x86_64                   4/8 
2025-12-04T09:24:51.3342601Z   Verifying        : nvidia-container-toolkit-1.17.8-1.x86_64               5/8 
2025-12-04T09:24:51.3343179Z   Verifying        : nvidia-container-toolkit-1.18.1-1.x86_64               6/8 
2025-12-04T09:24:51.3343760Z   Verifying        : nvidia-container-toolkit-base-1.17.8-1.x86_64          7/8 
2025-12-04T09:24:51.4893606Z   Verifying        : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8================================================================================
2025-12-04T09:24:51.4894221Z WARNING:
2025-12-04T09:24:51.4894463Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:24:51.4894712Z 
2025-12-04T09:24:51.4894802Z   Available Versions:
2025-12-04T09:24:51.4894953Z 
2025-12-04T09:24:51.4895067Z   Version 2023.9.20250929:
2025-12-04T09:24:51.4895380Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:24:51.4895658Z 
2025-12-04T09:24:51.4895782Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:24:51.4896012Z 
2025-12-04T09:24:51.4896096Z     Release notes:
2025-12-04T09:24:51.4896531Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:24:51.4896936Z 
2025-12-04T09:24:51.4897321Z   Version 2023.9.20251014:
2025-12-04T09:24:51.4897646Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:24:51.4897923Z 
2025-12-04T09:24:51.4898040Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:24:51.4898263Z 
2025-12-04T09:24:51.4898354Z     Release notes:
2025-12-04T09:24:51.4898765Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:24:51.4899290Z 
2025-12-04T09:24:51.4899378Z   Version 2023.9.20251020:
2025-12-04T09:24:51.4899883Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:24:51.4900150Z 
2025-12-04T09:24:51.4900263Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:24:51.4900490Z 
2025-12-04T09:24:51.4900573Z     Release notes:
2025-12-04T09:24:51.4900984Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:24:51.4901377Z 
2025-12-04T09:24:51.4901469Z   Version 2023.9.20251027:
2025-12-04T09:24:51.4901782Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:24:51.4902057Z 
2025-12-04T09:24:51.4902172Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:24:51.4902391Z 
2025-12-04T09:24:51.4902477Z     Release notes:
2025-12-04T09:24:51.4902888Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:24:51.4903282Z 
2025-12-04T09:24:51.4903369Z   Version 2023.9.20251105:
2025-12-04T09:24:51.4903682Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:24:51.4903954Z 
2025-12-04T09:24:51.4904073Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:24:51.4904293Z 
2025-12-04T09:24:51.4904376Z     Release notes:
2025-12-04T09:24:51.4904783Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:24:51.4905185Z 
2025-12-04T09:24:51.4905272Z   Version 2023.9.20251110:
2025-12-04T09:24:51.4905589Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:24:51.4905862Z 
2025-12-04T09:24:51.4905975Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:24:51.4906204Z 
2025-12-04T09:24:51.4906288Z     Release notes:
2025-12-04T09:24:51.4906698Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:24:51.4907092Z 
2025-12-04T09:24:51.4907189Z   Version 2023.9.20251117:
2025-12-04T09:24:51.4907500Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:24:51.4908123Z 
2025-12-04T09:24:51.4908285Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:24:51.4908541Z 
2025-12-04T09:24:51.4908642Z     Release notes:
2025-12-04T09:24:51.4909054Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:24:51.4909458Z 
2025-12-04T09:24:51.4909571Z ================================================================================
2025-12-04T09:24:51.5485526Z  
2025-12-04T09:24:51.5485675Z 
2025-12-04T09:24:51.5486061Z Downgraded:
2025-12-04T09:24:51.5486599Z   libnvidia-container-tools-1.17.8-1.x86_64                                     
2025-12-04T09:24:51.5487417Z   libnvidia-container1-1.17.8-1.x86_64                                          
2025-12-04T09:24:51.5488241Z   nvidia-container-toolkit-1.17.8-1.x86_64                                      
2025-12-04T09:24:51.5489094Z   nvidia-container-toolkit-base-1.17.8-1.x86_64                                 
2025-12-04T09:24:51.5489579Z 
2025-12-04T09:24:51.5489699Z Complete!
2025-12-04T09:24:51.6253014Z + sudo systemctl restart docker
2025-12-04T09:24:58.6056308Z Thu Dec  4 09:24:58 2025       
2025-12-04T09:24:58.6056718Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:24:58.6057252Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:24:58.6057772Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:24:58.6058667Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:24:58.6059339Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:24:58.6059805Z |                                         |                        |               MIG M. |
2025-12-04T09:24:58.6060153Z |=========================================+========================+======================|
2025-12-04T09:24:58.6151737Z |   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:24:58.6152451Z |  0%   25C    P0             55W /  300W |       0MiB /  23028MiB |      4%      Default |
2025-12-04T09:24:58.6152853Z |                                         |                        |                  N/A |
2025-12-04T09:24:58.6153373Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:24:58.6153782Z 
2025-12-04T09:24:58.6154001Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:24:58.6154464Z | Processes:                                                                              |
2025-12-04T09:24:58.6154933Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:24:58.6155363Z |        ID   ID                                                               Usage      |
2025-12-04T09:24:58.6155730Z |=========================================================================================|
2025-12-04T09:24:58.6157493Z |  No running processes found                                                             |
2025-12-04T09:24:58.6158026Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:24:58.7881908Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally
2025-12-04T09:24:58.9464443Z 3.13: Pulling from docker/library/python
2025-12-04T09:24:59.0274308Z 53c88f1dfeb7: Pulling fs layer
2025-12-04T09:24:59.0274739Z eae668646f44: Pulling fs layer
2025-12-04T09:24:59.0275051Z ff2e6e687b6c: Pulling fs layer
2025-12-04T09:24:59.0275326Z 7c40a3faff76: Pulling fs layer
2025-12-04T09:24:59.0275593Z 967a3b1c8fef: Pulling fs layer
2025-12-04T09:24:59.0275854Z a64e1a44f22a: Pulling fs layer
2025-12-04T09:24:59.0276117Z 52655f8a5bcc: Pulling fs layer
2025-12-04T09:24:59.0276386Z 967a3b1c8fef: Waiting
2025-12-04T09:24:59.0276607Z a64e1a44f22a: Waiting
2025-12-04T09:24:59.0276831Z 7c40a3faff76: Waiting
2025-12-04T09:24:59.0277048Z 52655f8a5bcc: Waiting
2025-12-04T09:24:59.1616775Z eae668646f44: Verifying Checksum
2025-12-04T09:24:59.1617179Z eae668646f44: Download complete
2025-12-04T09:24:59.2038458Z 53c88f1dfeb7: Verifying Checksum
2025-12-04T09:24:59.2038844Z 53c88f1dfeb7: Download complete
2025-12-04T09:24:59.2584533Z 967a3b1c8fef: Verifying Checksum
2025-12-04T09:24:59.2584879Z 967a3b1c8fef: Download complete
2025-12-04T09:24:59.2728254Z ff2e6e687b6c: Verifying Checksum
2025-12-04T09:24:59.2728626Z ff2e6e687b6c: Download complete
2025-12-04T09:24:59.3530703Z 52655f8a5bcc: Verifying Checksum
2025-12-04T09:24:59.3531044Z 52655f8a5bcc: Download complete
2025-12-04T09:24:59.3875869Z a64e1a44f22a: Verifying Checksum
2025-12-04T09:24:59.3876197Z a64e1a44f22a: Download complete
2025-12-04T09:24:59.9818936Z 7c40a3faff76: Verifying Checksum
2025-12-04T09:24:59.9819446Z 7c40a3faff76: Download complete
2025-12-04T09:25:01.0106959Z 53c88f1dfeb7: Pull complete
2025-12-04T09:25:01.7359545Z eae668646f44: Pull complete
2025-12-04T09:25:04.2578581Z ff2e6e687b6c: Pull complete
2025-12-04T09:25:10.9948548Z 7c40a3faff76: Pull complete
2025-12-04T09:25:11.2831292Z 967a3b1c8fef: Pull complete
2025-12-04T09:25:12.0497775Z a64e1a44f22a: Pull complete
2025-12-04T09:25:12.0722482Z 52655f8a5bcc: Pull complete
2025-12-04T09:25:12.0863690Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T09:25:12.0903246Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13
2025-12-04T09:25:19.1743311Z Thu Dec  4 09:25:19 2025       
2025-12-04T09:25:19.1743728Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:25:19.1744261Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:25:19.1744782Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:25:19.1745309Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:25:19.1748006Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:25:19.1748466Z |                                         |                        |               MIG M. |
2025-12-04T09:25:19.1748819Z |=========================================+========================+======================|
2025-12-04T09:25:19.1894109Z |   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:25:19.1894600Z |  0%   22C    P8             10W /  300W |       0MiB /  23028MiB |      0%      Default |
2025-12-04T09:25:19.1895005Z |                                         |                        |                  N/A |
2025-12-04T09:25:19.1895415Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:25:19.1898614Z 
2025-12-04T09:25:19.1899420Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:25:19.1900019Z | Processes:                                                                              |
2025-12-04T09:25:19.1900596Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:25:19.1901036Z |        ID   ID                                                               Usage      |
2025-12-04T09:25:19.1901467Z |=========================================================================================|
2025-12-04T09:25:19.1905329Z |  No running processes found                                                             |
2025-12-04T09:25:19.1905938Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:25:20.5885896Z Command completed after 1 attempt(s).
2025-12-04T09:25:20.5990580Z Prepare all required actions
2025-12-04T09:25:20.6021779Z ##[group]Run ./.github/actions/get-workflow-job-id
2025-12-04T09:25:20.6022115Z with:
2025-12-04T09:25:20.6022835Z   github-token: ***
2025-12-04T09:25:20.6023057Z env:
2025-12-04T09:25:20.6023269Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:20.6023531Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:20.6023825Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:20.6024178Z ##[endgroup]
2025-12-04T09:25:20.6039516Z ##[group]Run set -eux
2025-12-04T09:25:20.6039758Z [36;1mset -eux[0m
2025-12-04T09:25:20.6040193Z [36;1mpython3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"[0m
2025-12-04T09:25:20.6055483Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:20.6055855Z env:
2025-12-04T09:25:20.6056062Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:20.6056321Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:20.6056667Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:20.6057156Z   GITHUB_TOKEN: ***
2025-12-04T09:25:20.6057377Z ##[endgroup]
2025-12-04T09:25:20.6100608Z + python3 .github/scripts/get_workflow_job_id.py 19922826259 i-0f694664a515f0ebd
2025-12-04T09:25:22.3437664Z Setting output job-id=57118183212
2025-12-04T09:25:22.3438583Z Setting output job-name=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:25:22.3553224Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
2025-12-04T09:25:22.3553976Z [36;1mpython3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84[0m
2025-12-04T09:25:22.3554956Z [36;1mpython3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &[0m
2025-12-04T09:25:22.3555835Z [36;1mecho "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:25:22.3565624Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:22.3565992Z env:
2025-12-04T09:25:22.3566208Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:22.3566476Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:22.3566785Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:22.3567324Z   JOB_ID: 57118183212
2025-12-04T09:25:22.3568038Z   JOB_NAME: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:25:22.3568785Z   WORKFLOW_NAME: periodic
2025-12-04T09:25:22.3569051Z   WORKFLOW_RUN_ID: 19922826259
2025-12-04T09:25:22.3569328Z   MONITOR_LOG_INTERVAL: 5
2025-12-04T09:25:22.3569597Z   MONITOR_DATA_COLLECT_INTERVAL: 1
2025-12-04T09:25:22.3569888Z ##[endgroup]
2025-12-04T09:25:22.6485804Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:25:23.0626080Z Collecting psutil==5.9.8
2025-12-04T09:25:23.0781041Z   Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
2025-12-04T09:25:23.1588570Z Collecting dataclasses_json==0.6.7
2025-12-04T09:25:23.1619865Z   Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
2025-12-04T09:25:23.1917542Z Collecting nvidia-ml-py==11.525.84
2025-12-04T09:25:23.1952343Z   Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB)
2025-12-04T09:25:23.3254254Z Collecting marshmallow<4.0.0,>=3.18.0
2025-12-04T09:25:23.3286249Z   Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
2025-12-04T09:25:23.3529199Z Collecting typing-inspect<1,>=0.4.0
2025-12-04T09:25:23.3560478Z   Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
2025-12-04T09:25:23.4143696Z Collecting packaging>=17.0
2025-12-04T09:25:23.4174379Z   Downloading packaging-25.0-py3-none-any.whl (66 kB)
2025-12-04T09:25:23.4417285Z Collecting mypy-extensions>=0.3.0
2025-12-04T09:25:23.4447745Z   Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
2025-12-04T09:25:23.4968224Z Collecting typing-extensions>=3.7.4
2025-12-04T09:25:23.5000889Z   Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
2025-12-04T09:25:23.5928431Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json
2025-12-04T09:25:23.8734648Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0
2025-12-04T09:25:24.0712250Z Prepare all required actions
2025-12-04T09:25:24.0712609Z Getting action download info
2025-12-04T09:25:24.3664923Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:25:24.6197110Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093)
2025-12-04T09:25:25.0137652Z ##[group]Run ./.github/actions/download-build-artifacts
2025-12-04T09:25:25.0138711Z with:
2025-12-04T09:25:25.0139219Z   name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T09:25:25.0140044Z   s3-bucket: gha-artifacts
2025-12-04T09:25:25.0140397Z env:
2025-12-04T09:25:25.0141231Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:25.0141745Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:25.0142127Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:25.0142890Z ##[endgroup]
2025-12-04T09:25:25.0180788Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:25:25.0181440Z with:
2025-12-04T09:25:25.0181767Z   name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T09:25:25.0182286Z   s3-bucket: gha-artifacts
2025-12-04T09:25:25.0182659Z   region: us-east-1
2025-12-04T09:25:25.0182984Z env:
2025-12-04T09:25:25.0183282Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:25.0183641Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:25.0184061Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:25.0184500Z ##[endgroup]
2025-12-04T09:25:25.4968891Z (node:59427) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:25:25.4969416Z 
2025-12-04T09:25:25.4969609Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:25:25.4970147Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:25:25.4970938Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:25:25.7788346Z Found 1 objects with prefix pytorch/pytorch/19922826259/linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck/
2025-12-04T09:25:25.7789200Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:25:33.1319622Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:25:33.1324140Z Artifact download has finished successfully
2025-12-04T09:25:33.1701045Z ##[group]Run unzip -o artifacts.zip
2025-12-04T09:25:33.1701405Z [36;1munzip -o artifacts.zip[0m
2025-12-04T09:25:33.1712075Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:33.1712447Z env:
2025-12-04T09:25:33.1712648Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:33.1712904Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:33.1713215Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:33.1713578Z ##[endgroup]
2025-12-04T09:25:33.1797288Z Archive:  artifacts.zip
2025-12-04T09:25:33.1799094Z    creating: dist/
2025-12-04T09:25:35.2538762Z   inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl  
2025-12-04T09:25:35.2675507Z   inflating: dist/.ninja_log         
2025-12-04T09:25:35.2676534Z    creating: build/custom_test_artifacts/
2025-12-04T09:25:35.2677138Z    creating: build/custom_test_artifacts/custom-op-build/
2025-12-04T09:25:35.2677797Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2025-12-04T09:25:35.2678609Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/
2025-12-04T09:25:35.2687720Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:25:35.2688672Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/
2025-12-04T09:25:35.2689564Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:25:35.2690569Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:25:35.2691867Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:25:35.2695018Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:25:35.2696719Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:25:35.2698113Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:25:35.2699219Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:25:35.2700223Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:25:35.2703559Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:25:35.2705199Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:25:35.2706887Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:25:35.2709694Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:25:35.2712564Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:25:35.2713660Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:25:35.2714708Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:25:35.2775214Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:25:35.2837429Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:25:35.2839241Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:25:35.2904892Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:25:35.2906326Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:25:35.2908067Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:25:35.2909217Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:25:35.2910711Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:25:35.2912203Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:25:35.2913652Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:25:35.2915074Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:25:35.2916483Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:25:35.2917812Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:25:35.2919139Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:25:35.2920107Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:25:35.2921533Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:25:35.2923331Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:25:35.2926370Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:25:35.3002410Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:25:35.3004022Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:25:35.3079117Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:25:35.3079911Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/
2025-12-04T09:25:35.3080949Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2025-12-04T09:25:35.3082407Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:25:35.3083655Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2025-12-04T09:25:35.3085047Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2025-12-04T09:25:35.3086622Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2025-12-04T09:25:35.3088126Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2025-12-04T09:25:35.3089515Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2025-12-04T09:25:35.3090958Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:25:35.3092145Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2025-12-04T09:25:35.3093216Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2025-12-04T09:25:35.3094008Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2025-12-04T09:25:35.3094804Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2025-12-04T09:25:35.3115386Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2025-12-04T09:25:35.3319923Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2025-12-04T09:25:35.3320661Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2025-12-04T09:25:35.3321458Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2025-12-04T09:25:35.3322790Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2025-12-04T09:25:35.3324035Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2025-12-04T09:25:35.3325159Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2025-12-04T09:25:35.3326318Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:25:35.3327472Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2025-12-04T09:25:35.3329125Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2025-12-04T09:25:35.3330283Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2025-12-04T09:25:35.3331656Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2025-12-04T09:25:35.3354574Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2025-12-04T09:25:35.3438631Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2025-12-04T09:25:35.3440099Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:25:35.3441202Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:25:35.3442197Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2025-12-04T09:25:35.3443126Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2025-12-04T09:25:35.3445685Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:25:35.3446599Z   inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc  
2025-12-04T09:25:35.3450051Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2025-12-04T09:25:35.3451113Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2025-12-04T09:25:35.3453566Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2025-12-04T09:25:35.3630276Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2025-12-04T09:25:35.3688333Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2025-12-04T09:25:35.3689039Z    creating: build/custom_test_artifacts/jit-hook-build/
2025-12-04T09:25:35.3689678Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2025-12-04T09:25:35.3690511Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/
2025-12-04T09:25:35.3698642Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:25:35.3699642Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/
2025-12-04T09:25:35.3700517Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:25:35.3701720Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:25:35.3702699Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:25:35.3705368Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:25:35.3707125Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:25:35.3709098Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:25:35.3710071Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:25:35.3711057Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:25:35.3714330Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:25:35.3715960Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:25:35.3717671Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:25:35.3720413Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:25:35.3722635Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:25:35.3723709Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:25:35.3724706Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:25:35.3785773Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:25:35.3847643Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:25:35.3849147Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:25:35.3915434Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:25:35.3916865Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:25:35.3918291Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:25:35.3919758Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:25:35.3921145Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:25:35.3922527Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:25:35.3923947Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:25:35.3925369Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:25:35.3926713Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:25:35.3927983Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:25:35.3929229Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:25:35.3930461Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:25:35.3931757Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:25:35.3932941Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:25:35.3936300Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:25:35.4012145Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:25:35.4012941Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:25:35.4089640Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:25:35.4091188Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/
2025-12-04T09:25:35.4092218Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2025-12-04T09:25:35.4092849Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:25:35.4093533Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2025-12-04T09:25:35.4094305Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2025-12-04T09:25:35.4095192Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2025-12-04T09:25:35.4096038Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2025-12-04T09:25:35.4096825Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2025-12-04T09:25:35.4097639Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2025-12-04T09:25:35.4099040Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2025-12-04T09:25:35.4100281Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2025-12-04T09:25:35.4101432Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2025-12-04T09:25:35.4102987Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2025-12-04T09:25:35.4125429Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2025-12-04T09:25:35.4191424Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2025-12-04T09:25:35.4192615Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:25:35.4193404Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:25:35.4194110Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2025-12-04T09:25:35.4195232Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2025-12-04T09:25:35.4197639Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:25:35.4198291Z   inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc  
2025-12-04T09:25:35.4201443Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2025-12-04T09:25:35.4202562Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2025-12-04T09:25:35.4203826Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2025-12-04T09:25:35.4244589Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2025-12-04T09:25:35.4245313Z    creating: build/custom_test_artifacts/custom-backend-build/
2025-12-04T09:25:35.4246040Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2025-12-04T09:25:35.4246935Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/
2025-12-04T09:25:35.4255093Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:25:35.4256082Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/
2025-12-04T09:25:35.4257050Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:25:35.4258067Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:25:35.4259231Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:25:35.4262015Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:25:35.4263907Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:25:35.4265320Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:25:35.4266392Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:25:35.4267474Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:25:35.4270724Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:25:35.4272311Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:25:35.4274045Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:25:35.4276728Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:25:35.4278902Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:25:35.4280066Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:25:35.4281158Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:25:35.4342832Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:25:35.4404162Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:25:35.4405795Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:25:35.4471527Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:25:35.4473079Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:25:35.4474614Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:25:35.4476176Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:25:35.4477690Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:25:35.4479176Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:25:35.4480668Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:25:35.4482185Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:25:35.4483630Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:25:35.4485273Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:25:35.4486604Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:25:35.4487912Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:25:35.4489218Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:25:35.4490573Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:25:35.4492976Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:25:35.4569378Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:25:35.4570596Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:25:35.4646755Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:25:35.4647913Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/
2025-12-04T09:25:35.4648801Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2025-12-04T09:25:35.4649755Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:25:35.4650759Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2025-12-04T09:25:35.4651908Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2025-12-04T09:25:35.4653248Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2025-12-04T09:25:35.4654509Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2025-12-04T09:25:35.4655883Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2025-12-04T09:25:35.4657102Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:25:35.4658304Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2025-12-04T09:25:35.4659640Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2025-12-04T09:25:35.4660877Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2025-12-04T09:25:35.4662113Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2025-12-04T09:25:35.4666167Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2025-12-04T09:25:35.4790306Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2025-12-04T09:25:35.4791514Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2025-12-04T09:25:35.4792948Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2025-12-04T09:25:35.4794427Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2025-12-04T09:25:35.4795840Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2025-12-04T09:25:35.4797448Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2025-12-04T09:25:35.4798868Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:25:35.4800534Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2025-12-04T09:25:35.4801995Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2025-12-04T09:25:35.4803481Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2025-12-04T09:25:35.4804904Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2025-12-04T09:25:35.4825179Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2025-12-04T09:25:35.4882050Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2025-12-04T09:25:35.4883571Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:25:35.4884914Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:25:35.4886093Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2025-12-04T09:25:35.4887273Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2025-12-04T09:25:35.4889368Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:25:35.4890469Z   inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc  
2025-12-04T09:25:35.4894070Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2025-12-04T09:25:35.4895257Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2025-12-04T09:25:35.4896585Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2025-12-04T09:25:35.5001300Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2025-12-04T09:25:35.5042385Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2025-12-04T09:25:35.5043280Z    creating: build/lib/
2025-12-04T09:25:35.5127342Z   inflating: build/lib/libprotobuf-lite.a  
2025-12-04T09:25:35.5574299Z   inflating: build/lib/libprotobuf.a  
2025-12-04T09:25:35.6073620Z   inflating: build/lib/libprotoc.a   
2025-12-04T09:25:35.6082814Z   inflating: build/lib/libpthreadpool.a  
2025-12-04T09:25:35.6091618Z   inflating: build/lib/libcpuinfo.a  
2025-12-04T09:25:35.6099968Z   inflating: build/lib/libcpuinfo_internals.a  
2025-12-04T09:25:35.6100970Z   inflating: build/lib/libclog.a     
2025-12-04T09:25:35.6121375Z   inflating: build/lib/libpytorch_qnnpack.a  
2025-12-04T09:25:35.6124520Z   inflating: build/lib/libnnpack_reference_layers.a  
2025-12-04T09:25:35.6142820Z   inflating: build/lib/libnnpack.a   
2025-12-04T09:25:35.6331190Z   inflating: build/lib/libmicrokernels-prod.a  
2025-12-04T09:25:35.7216087Z   inflating: build/lib/libmicrokernels-all.a  
2025-12-04T09:25:35.7287397Z   inflating: build/lib/libgtest.a    
2025-12-04T09:25:35.7304539Z   inflating: build/lib/libgmock.a    
2025-12-04T09:25:35.7306008Z   inflating: build/lib/libgtest_main.a  
2025-12-04T09:25:35.7307159Z   inflating: build/lib/libgmock_main.a  
2025-12-04T09:25:35.7400253Z   inflating: build/lib/libXNNPACK.a  
2025-12-04T09:25:35.7477426Z   inflating: build/lib/libbenchmark.a  
2025-12-04T09:25:35.7478086Z   inflating: build/lib/libbenchmark_main.a  
2025-12-04T09:25:35.7479702Z   inflating: build/lib/libjitprofiling.a  
2025-12-04T09:25:35.7547619Z   inflating: build/lib/libasmjit.a   
2025-12-04T09:25:35.7556291Z   inflating: build/lib/libittnotify.a  
2025-12-04T09:25:35.8760370Z   inflating: build/lib/libfbgemm.a   
2025-12-04T09:25:35.8791609Z   inflating: build/lib/libtensorpipe_uv.a  
2025-12-04T09:25:35.9346920Z   inflating: build/lib/libtensorpipe.a  
2025-12-04T09:25:35.9595319Z   inflating: build/lib/libtensorpipe_cuda.a  
2025-12-04T09:25:35.9732404Z   inflating: build/lib/libgloo.a     
2025-12-04T09:25:35.9780593Z   inflating: build/lib/libonnx_proto.a  
2025-12-04T09:25:36.0230394Z   inflating: build/lib/libgloo_cuda.a  
2025-12-04T09:25:36.0953289Z   inflating: build/lib/libonnx.a     
2025-12-04T09:25:37.1288838Z   inflating: build/lib/libdnnl.a     
2025-12-04T09:25:37.1309066Z   inflating: build/lib/libfmt.a      
2025-12-04T09:25:37.1791595Z   inflating: build/lib/libkineto.a   
2025-12-04T09:25:37.1909136Z   inflating: build/lib/libc10.so     
2025-12-04T09:25:37.1959278Z   inflating: build/lib/libc10_cuda.so  
2025-12-04T09:25:37.1961519Z   inflating: build/lib/libcaffe2_nvrtc.so  
2025-12-04T09:25:37.1963286Z   inflating: build/lib/libtorch_global_deps.so  
2025-12-04T09:25:40.3326316Z   inflating: build/lib/libtorch_cpu.so  
2025-12-04T09:25:40.4116899Z   inflating: build/lib/libtorch_nvshmem.so  
2025-12-04T09:25:42.3928158Z   inflating: build/lib/libtorch_cuda.so  
2025-12-04T09:25:42.3929440Z   inflating: build/lib/libtorch.so   
2025-12-04T09:25:42.3981901Z   inflating: build/lib/libtorch_cuda_linalg.so  
2025-12-04T09:25:42.4054152Z   inflating: build/lib/libtorchbind_test.so  
2025-12-04T09:25:42.4073375Z   inflating: build/lib/libjitbackend_test.so  
2025-12-04T09:25:42.4097478Z   inflating: build/lib/libbackend_with_compiler.so  
2025-12-04T09:25:42.4124333Z   inflating: build/lib/libaoti_custom_ops.so  
2025-12-04T09:25:42.4127253Z   inflating: build/lib/libc10d_cuda_test.so  
2025-12-04T09:25:42.4131834Z   inflating: build/lib/libshm.so     
2025-12-04T09:25:42.6539722Z   inflating: build/lib/libtorch_python.so  
2025-12-04T09:25:42.6576519Z   inflating: build/lib/libnnapi_backend.so  
2025-12-04T09:25:42.6576847Z    creating: build/bin/
2025-12-04T09:25:42.7037471Z   inflating: build/bin/protoc-3.13.0.0  
2025-12-04T09:25:42.7496384Z   inflating: build/bin/protoc        
2025-12-04T09:25:42.7556821Z   inflating: build/bin/c10_AllocatorConfig_test  
2025-12-04T09:25:42.7612948Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2025-12-04T09:25:42.7670262Z   inflating: build/bin/c10_DeviceGuard_test  
2025-12-04T09:25:42.7728571Z   inflating: build/bin/c10_Device_test  
2025-12-04T09:25:42.7795210Z   inflating: build/bin/c10_DispatchKeySet_test  
2025-12-04T09:25:42.7849881Z   inflating: build/bin/c10_StreamGuard_test  
2025-12-04T09:25:42.7910159Z   inflating: build/bin/c10_Scalar_test  
2025-12-04T09:25:42.7972619Z   inflating: build/bin/c10_SizesAndStrides_test  
2025-12-04T09:25:42.8033655Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2025-12-04T09:25:42.8096447Z   inflating: build/bin/c10_SymInt_test  
2025-12-04T09:25:42.8158625Z   inflating: build/bin/c10_InlineStreamGuard_test  
2025-12-04T09:25:42.8214347Z   inflating: build/bin/c10_ArrayRef_test  
2025-12-04T09:25:42.8291285Z   inflating: build/bin/c10_cow_test  
2025-12-04T09:25:42.8347132Z   inflating: build/bin/c10_ConstexprCrc_test  
2025-12-04T09:25:42.8403023Z   inflating: build/bin/c10_DeadlockDetection_test  
2025-12-04T09:25:42.8462276Z   inflating: build/bin/c10_Bitset_test  
2025-12-04T09:25:42.8526093Z   inflating: build/bin/c10_Enumerate_test  
2025-12-04T09:25:42.8584678Z   inflating: build/bin/c10_IntrusiveList_test  
2025-12-04T09:25:42.8641343Z   inflating: build/bin/c10_Half_test  
2025-12-04T09:25:42.8703613Z   inflating: build/bin/c10_LeftRight_test  
2025-12-04T09:25:42.8763766Z   inflating: build/bin/c10_NetworkFlow_test  
2025-12-04T09:25:42.8818878Z   inflating: build/bin/c10_Semaphore_test  
2025-12-04T09:25:42.8875115Z   inflating: build/bin/c10_Synchronized_test  
2025-12-04T09:25:42.8937698Z   inflating: build/bin/c10_ThreadLocal_test  
2025-12-04T09:25:42.8995529Z   inflating: build/bin/c10_TypeIndex_test  
2025-12-04T09:25:42.9053479Z   inflating: build/bin/c10_accumulate_test  
2025-12-04T09:25:42.9116136Z   inflating: build/bin/c10_bfloat16_test  
2025-12-04T09:25:42.9172780Z   inflating: build/bin/c10_bit_cast_test  
2025-12-04T09:25:42.9234423Z   inflating: build/bin/c10_complex_test  
2025-12-04T09:25:42.9297903Z   inflating: build/bin/c10_complex_math_test  
2025-12-04T09:25:42.9353251Z   inflating: build/bin/c10_error_test  
2025-12-04T09:25:42.9411929Z   inflating: build/bin/c10_exception_test  
2025-12-04T09:25:42.9468183Z   inflating: build/bin/c10_flags_test  
2025-12-04T09:25:42.9525008Z   inflating: build/bin/c10_generic_math_test  
2025-12-04T09:25:42.9581745Z   inflating: build/bin/c10_irange_test  
2025-12-04T09:25:42.9642086Z   inflating: build/bin/c10_lazy_test  
2025-12-04T09:25:42.9812411Z   inflating: build/bin/c10_intrusive_ptr_test  
2025-12-04T09:25:42.9875571Z   inflating: build/bin/c10_logging_test  
2025-12-04T09:25:42.9931317Z   inflating: build/bin/c10_nofatal_test  
2025-12-04T09:25:43.0014001Z   inflating: build/bin/c10_optional_test  
2025-12-04T09:25:43.0073325Z   inflating: build/bin/c10_registry_test  
2025-12-04T09:25:43.0141949Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2025-12-04T09:25:43.0308090Z   inflating: build/bin/c10_small_vector_test  
2025-12-04T09:25:43.0371866Z   inflating: build/bin/c10_string_util_test  
2025-12-04T09:25:43.0430083Z   inflating: build/bin/c10_ssize_test  
2025-12-04T09:25:43.0486315Z   inflating: build/bin/c10_tempfile_test  
2025-12-04T09:25:43.0541545Z   inflating: build/bin/c10_string_view_test  
2025-12-04T09:25:43.0604350Z   inflating: build/bin/c10_typeid_test  
2025-12-04T09:25:43.0652980Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2025-12-04T09:25:43.0712522Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device  
2025-12-04T09:25:43.0771928Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream  
2025-12-04T09:25:43.0830916Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes  
2025-12-04T09:25:43.0890264Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test  
2025-12-04T09:25:43.0946537Z   inflating: build/bin/c10_cuda_CUDATest  
2025-12-04T09:25:43.1006222Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks  
2025-12-04T09:25:43.1064984Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads  
2025-12-04T09:25:43.1125007Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block  
2025-12-04T09:25:43.1747343Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2025-12-04T09:25:43.2383868Z   inflating: build/bin/vec_test_all_types_AVX512  
2025-12-04T09:25:43.3029600Z   inflating: build/bin/vec_test_all_types_AVX2  
2025-12-04T09:25:43.3136100Z   inflating: build/bin/test_aoti_abi_check  
2025-12-04T09:25:43.3191584Z   inflating: build/bin/test_vec_half_DEFAULT  
2025-12-04T09:25:43.3247877Z   inflating: build/bin/test_vec_half_AVX512  
2025-12-04T09:25:43.3303485Z   inflating: build/bin/test_vec_half_AVX2  
2025-12-04T09:25:43.3384308Z   inflating: build/bin/Dict_test     
2025-12-04T09:25:43.3443471Z   inflating: build/bin/Dimname_test  
2025-12-04T09:25:43.3515730Z   inflating: build/bin/MaybeOwned_test  
2025-12-04T09:25:43.3579259Z   inflating: build/bin/NamedTensor_test  
2025-12-04T09:25:43.3644777Z   inflating: build/bin/apply_utils_test  
2025-12-04T09:25:43.3710085Z   inflating: build/bin/atest         
2025-12-04T09:25:43.3780584Z   inflating: build/bin/basic         
2025-12-04T09:25:43.3841277Z   inflating: build/bin/broadcast_test  
2025-12-04T09:25:43.3899168Z   inflating: build/bin/cpu_allocator_test  
2025-12-04T09:25:43.3963573Z   inflating: build/bin/cpu_generator_test  
2025-12-04T09:25:43.4022782Z   inflating: build/bin/cpu_profiling_allocator_test  
2025-12-04T09:25:43.4123674Z   inflating: build/bin/cpu_rng_test  
2025-12-04T09:25:43.4180889Z   inflating: build/bin/dlconvertor_test  
2025-12-04T09:25:43.4245634Z   inflating: build/bin/extension_backend_test  
2025-12-04T09:25:43.4306798Z   inflating: build/bin/half_test     
2025-12-04T09:25:43.4412717Z   inflating: build/bin/ivalue_test   
2025-12-04T09:25:43.4468571Z   inflating: build/bin/lazy_tensor_test  
2025-12-04T09:25:43.4527606Z   inflating: build/bin/math_kernel_test  
2025-12-04T09:25:43.4586742Z   inflating: build/bin/memory_format_test  
2025-12-04T09:25:43.4646634Z   inflating: build/bin/memory_overlapping_test  
2025-12-04T09:25:43.4706282Z   inflating: build/bin/mobile_memory_cleanup  
2025-12-04T09:25:43.4769162Z   inflating: build/bin/native_test   
2025-12-04T09:25:43.4825817Z   inflating: build/bin/operator_name_test  
2025-12-04T09:25:43.4882434Z   inflating: build/bin/operators_test  
2025-12-04T09:25:43.4941224Z   inflating: build/bin/packedtensoraccessor_test  
2025-12-04T09:25:43.5015795Z   inflating: build/bin/pow_test      
2025-12-04T09:25:43.5078327Z   inflating: build/bin/quantized_test  
2025-12-04T09:25:43.5134694Z   inflating: build/bin/reduce_ops_test  
2025-12-04T09:25:43.5191514Z   inflating: build/bin/reportMemoryUsage_test  
2025-12-04T09:25:43.5253854Z   inflating: build/bin/scalar_tensor_test  
2025-12-04T09:25:43.5317853Z   inflating: build/bin/scalar_test   
2025-12-04T09:25:43.5376005Z   inflating: build/bin/StorageUtils_test  
2025-12-04T09:25:43.5434056Z   inflating: build/bin/stride_properties_test  
2025-12-04T09:25:43.5520827Z   inflating: build/bin/tensor_iterator_test  
2025-12-04T09:25:43.5581152Z   inflating: build/bin/test_parallel  
2025-12-04T09:25:43.5638078Z   inflating: build/bin/thread_init_test  
2025-12-04T09:25:43.5699318Z   inflating: build/bin/type_ptr_test  
2025-12-04T09:25:43.5765141Z   inflating: build/bin/type_test     
2025-12-04T09:25:43.5823878Z   inflating: build/bin/undefined_tensor_test  
2025-12-04T09:25:43.5879646Z   inflating: build/bin/verify_api_visibility  
2025-12-04T09:25:43.5958536Z   inflating: build/bin/legacy_vmap_test  
2025-12-04T09:25:43.6015188Z   inflating: build/bin/weakref_test  
2025-12-04T09:25:43.6074078Z   inflating: build/bin/wrapdim_test  
2025-12-04T09:25:43.6131679Z   inflating: build/bin/xla_tensor_test  
2025-12-04T09:25:43.6197719Z   inflating: build/bin/IListRef_test  
2025-12-04T09:25:43.6312685Z   inflating: build/bin/List_test     
2025-12-04T09:25:43.6385603Z   inflating: build/bin/KernelFunction_test  
2025-12-04T09:25:43.6516301Z   inflating: build/bin/kernel_function_legacy_test  
2025-12-04T09:25:43.6619681Z   inflating: build/bin/kernel_function_test  
2025-12-04T09:25:43.6756230Z   inflating: build/bin/kernel_lambda_legacy_test  
2025-12-04T09:25:43.6867268Z   inflating: build/bin/kernel_lambda_test  
2025-12-04T09:25:43.6933515Z   inflating: build/bin/kernel_stackbased_test  
2025-12-04T09:25:43.7037583Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2025-12-04T09:25:43.7094175Z   inflating: build/bin/CppSignature_test  
2025-12-04T09:25:43.7155956Z   inflating: build/bin/backend_fallback_test  
2025-12-04T09:25:43.7211261Z   inflating: build/bin/op_allowlist_test  
2025-12-04T09:25:43.7542280Z   inflating: build/bin/op_registration_test  
2025-12-04T09:25:43.7615621Z   inflating: build/bin/inline_container_test  
2025-12-04T09:25:43.7675134Z   inflating: build/bin/cuda_allocator_test  
2025-12-04T09:25:43.7734803Z   inflating: build/bin/cuda_apply_test  
2025-12-04T09:25:43.7801211Z   inflating: build/bin/cuda_atomic_ops_test  
2025-12-04T09:25:43.7864499Z   inflating: build/bin/cuda_caching_host_allocator_test  
2025-12-04T09:25:43.7942278Z   inflating: build/bin/cuda_complex_math_test  
2025-12-04T09:25:43.8008054Z   inflating: build/bin/cuda_complex_test  
2025-12-04T09:25:43.8078027Z   inflating: build/bin/cuda_cub_test  
2025-12-04T09:25:43.8136643Z   inflating: build/bin/cuda_cublas_handle_pool_test  
2025-12-04T09:25:43.8192423Z   inflating: build/bin/cuda_device_test  
2025-12-04T09:25:43.8264101Z   inflating: build/bin/cuda_distributions_test  
2025-12-04T09:25:43.8323333Z   inflating: build/bin/cuda_event_test  
2025-12-04T09:25:43.8381933Z   inflating: build/bin/cuda_dlconvertor_test  
2025-12-04T09:25:43.8437001Z   inflating: build/bin/cuda_exchange_device_test  
2025-12-04T09:25:43.8495954Z   inflating: build/bin/cuda_reportMemoryUsage_test  
2025-12-04T09:25:43.8551977Z   inflating: build/bin/cuda_allocatorTraceTracker_test  
2025-12-04T09:25:43.8609954Z   inflating: build/bin/cuda_integer_divider_test  
2025-12-04T09:25:43.8677600Z   inflating: build/bin/cuda_stream_test  
2025-12-04T09:25:43.8733268Z   inflating: build/bin/cuda_cudnn_test  
2025-12-04T09:25:43.8789436Z   inflating: build/bin/cuda_half_test  
2025-12-04T09:25:43.8852336Z   inflating: build/bin/cuda_generator_test  
2025-12-04T09:25:43.8908145Z   inflating: build/bin/cuda_optional_test  
2025-12-04T09:25:43.8966165Z   inflating: build/bin/cuda_packedtensoraccessor_test  
2025-12-04T09:25:43.9025343Z   inflating: build/bin/cuda_vectorized_test  
2025-12-04T09:25:44.0168661Z   inflating: build/bin/test_jit      
2025-12-04T09:25:44.0227906Z   inflating: build/bin/BackoffTest   
2025-12-04T09:25:44.0287766Z   inflating: build/bin/FileStoreTest  
2025-12-04T09:25:44.0657096Z   inflating: build/bin/test_lazy     
2025-12-04T09:25:44.0719942Z   inflating: build/bin/TCPStoreTest  
2025-12-04T09:25:44.0779988Z   inflating: build/bin/HashStoreTest  
2025-12-04T09:25:44.0794186Z   inflating: build/bin/ProcessGroupMPITest  
2025-12-04T09:25:44.0797374Z   inflating: build/bin/example_allreduce  
2025-12-04T09:25:44.0871378Z   inflating: build/bin/ProcessGroupGlooTest  
2025-12-04T09:25:44.0934749Z   inflating: build/bin/ProcessGroupGlooAsyncTest  
2025-12-04T09:25:44.1005826Z   inflating: build/bin/ProcessGroupNCCLTest  
2025-12-04T09:25:44.1073652Z   inflating: build/bin/ProcessGroupNCCLErrorsTest  
2025-12-04T09:25:44.1135802Z   inflating: build/bin/test_dist_autograd  
2025-12-04T09:25:44.1211614Z   inflating: build/bin/test_cpp_rpc  
2025-12-04T09:25:44.1214209Z   inflating: build/bin/parallel_benchmark  
2025-12-04T09:25:44.2437209Z   inflating: build/bin/test_api      
2025-12-04T09:25:44.2441364Z   inflating: build/bin/torch_shm_manager  
2025-12-04T09:25:44.2441692Z    creating: .additional_ci_files/
2025-12-04T09:25:44.2507433Z   inflating: .additional_ci_files/test-times.json  
2025-12-04T09:25:44.2745549Z   inflating: .additional_ci_files/test-class-times.json  
2025-12-04T09:25:44.2795452Z ##[group]Run rm artifacts.zip
2025-12-04T09:25:44.2795769Z [36;1mrm artifacts.zip[0m
2025-12-04T09:25:44.2805227Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:44.2805601Z env:
2025-12-04T09:25:44.2805995Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:44.2806259Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:44.2806568Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:44.2806912Z ##[endgroup]
2025-12-04T09:25:44.4282542Z ##[group]Run df -H
2025-12-04T09:25:44.4282779Z [36;1mdf -H[0m
2025-12-04T09:25:44.4291859Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:44.4292436Z env:
2025-12-04T09:25:44.4292635Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:44.4292888Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:44.4293199Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:44.4293547Z ##[endgroup]
2025-12-04T09:25:44.4348782Z Filesystem        Size  Used Avail Use% Mounted on
2025-12-04T09:25:44.4349238Z devtmpfs          4.2M     0  4.2M   0% /dev
2025-12-04T09:25:44.4349580Z tmpfs              34G     0   34G   0% /dev/shm
2025-12-04T09:25:44.4349902Z tmpfs              14G  562k   14G   1% /run
2025-12-04T09:25:44.4350211Z /dev/nvme0n1p1    161G   54G  108G  34% /
2025-12-04T09:25:44.4350535Z tmpfs              34G   17k   34G   1% /tmp
2025-12-04T09:25:44.4350865Z /dev/nvme0n1p128   11M  1.4M  9.2M  13% /boot/efi
2025-12-04T09:25:44.4351209Z tmpfs             6.7G     0  6.7G   0% /run/user/0
2025-12-04T09:25:44.4387850Z Prepare all required actions
2025-12-04T09:25:44.4388883Z Getting action download info
2025-12-04T09:25:44.5914504Z ##[group]Run ./.github/actions/download-td-artifacts
2025-12-04T09:25:44.5914847Z with:
2025-12-04T09:25:44.5915033Z env:
2025-12-04T09:25:44.5915228Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:44.5915488Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:44.5915796Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:44.5916142Z ##[endgroup]
2025-12-04T09:25:44.6377533Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:25:44.6378021Z with:
2025-12-04T09:25:44.6378205Z   name: td_results
2025-12-04T09:25:44.6378436Z   s3-bucket: gha-artifacts
2025-12-04T09:25:44.6378684Z   region: us-east-1
2025-12-04T09:25:44.6378882Z env:
2025-12-04T09:25:44.6379175Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:44.6379427Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:44.6379726Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:44.6380255Z ##[endgroup]
2025-12-04T09:25:45.2579012Z (node:59451) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:25:45.2579999Z 
2025-12-04T09:25:45.2580344Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:25:45.2581299Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:25:45.2582319Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:25:45.3620565Z Found 1 objects with prefix pytorch/pytorch/19922826259/td_results/
2025-12-04T09:25:45.3621223Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:25:45.4207533Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:25:45.4213269Z Artifact download has finished successfully
2025-12-04T09:25:45.4556196Z ##[group]Run mkdir -p .additional_ci_files
2025-12-04T09:25:45.4556578Z [36;1mmkdir -p .additional_ci_files[0m
2025-12-04T09:25:45.4557013Z [36;1mmv td_results.json .additional_ci_files/td_results.json || true[0m
2025-12-04T09:25:45.4566999Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:45.4567360Z env:
2025-12-04T09:25:45.4567566Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:45.4567826Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:45.4568127Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:45.4568473Z ##[endgroup]
2025-12-04T09:25:45.4678414Z ##[group]Run .github/scripts/parse_ref.py
2025-12-04T09:25:45.4678799Z [36;1m.github/scripts/parse_ref.py[0m
2025-12-04T09:25:45.4687708Z shell: /usr/bin/bash -e {0}
2025-12-04T09:25:45.4687968Z env:
2025-12-04T09:25:45.4688171Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:45.4688429Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:45.4688727Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:45.4689080Z ##[endgroup]
2025-12-04T09:25:45.4929699Z Setting output branch=main
2025-12-04T09:25:45.5071513Z Prepare all required actions
2025-12-04T09:25:45.5071864Z Getting action download info
2025-12-04T09:25:45.6271156Z ##[group]Run ./.github/actions/filter-test-configs
2025-12-04T09:25:45.6271492Z with:
2025-12-04T09:25:45.6271874Z   github-token: ***
2025-12-04T09:25:45.6281175Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:25:45.6291288Z   job-name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:25:45.6292020Z env:
2025-12-04T09:25:45.6292226Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:45.6292483Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:45.6292787Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:45.6293139Z ##[endgroup]
2025-12-04T09:25:45.6328954Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:25:45.6329244Z with:
2025-12-04T09:25:45.6329451Z   shell: bash
2025-12-04T09:25:45.6329665Z   timeout_minutes: 10
2025-12-04T09:25:45.6329901Z   max_attempts: 5
2025-12-04T09:25:45.6330122Z   retry_wait_seconds: 30
2025-12-04T09:25:45.6330914Z   command: set -eux
# PyYAML 6.0 doesn't work with MacOS x86 anymore
# This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2
python3 -m pip install requests==2.27.1 pyyaml==6.0.2

2025-12-04T09:25:45.6331935Z   polling_interval_seconds: 1
2025-12-04T09:25:45.6332209Z   warning_on_retry: true
2025-12-04T09:25:45.6332465Z   continue_on_error: false
2025-12-04T09:25:45.6332705Z env:
2025-12-04T09:25:45.6332900Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:45.6333157Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:45.6333456Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:45.6333939Z   GITHUB_TOKEN: ***
2025-12-04T09:25:45.6334154Z ##[endgroup]
2025-12-04T09:25:45.7358016Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2
2025-12-04T09:25:45.9721811Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:25:46.0952086Z Collecting requests==2.27.1
2025-12-04T09:25:46.1119879Z   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
2025-12-04T09:25:46.3103990Z Collecting pyyaml==6.0.2
2025-12-04T09:25:46.3163394Z   Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB)
2025-12-04T09:25:46.3922158Z Collecting certifi>=2017.4.17
2025-12-04T09:25:46.3959198Z   Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
2025-12-04T09:25:46.4029172Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10)
2025-12-04T09:25:46.4032766Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10)
2025-12-04T09:25:46.8450818Z Collecting charset-normalizer~=2.0.0
2025-12-04T09:25:46.8489709Z   Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
2025-12-04T09:25:46.9367829Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml
2025-12-04T09:25:47.0597635Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1
2025-12-04T09:25:47.7122331Z Command completed after 1 attempt(s).
2025-12-04T09:25:47.7190272Z ##[group]Run set -x
2025-12-04T09:25:47.7202134Z [36;1mset -x[0m
2025-12-04T09:25:47.7202389Z [36;1m[0m
2025-12-04T09:25:47.7202775Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:25:47.7203250Z [36;1m# in runner workspace[0m
2025-12-04T09:25:47.7203641Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py"[0m
2025-12-04T09:25:47.7213227Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:47.7213594Z env:
2025-12-04T09:25:47.7213791Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:47.7214045Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:47.7214347Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:47.7214681Z ##[endgroup]
2025-12-04T09:25:47.7245916Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py
2025-12-04T09:25:47.7429525Z Setting output branch=main
2025-12-04T09:25:47.7485716Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}"
2025-12-04T09:25:47.7486160Z [36;1mecho "Workflow: ${GITHUB_WORKFLOW}"[0m
2025-12-04T09:25:47.7486494Z [36;1mecho "Job name: ${JOB_NAME}"[0m
2025-12-04T09:25:47.7486801Z [36;1m[0m
2025-12-04T09:25:47.7487162Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:25:47.7487639Z [36;1m# in runner workspace[0m
2025-12-04T09:25:47.7488062Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \[0m
2025-12-04T09:25:47.7488533Z [36;1m  --workflow "${GITHUB_WORKFLOW}" \[0m
2025-12-04T09:25:47.7488864Z [36;1m  --job-name "${JOB_NAME}" \[0m
2025-12-04T09:25:47.7498449Z [36;1m  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}" \[0m
2025-12-04T09:25:47.7508816Z [36;1m  --selected-test-configs "" \[0m
2025-12-04T09:25:47.7509252Z [36;1m  --pr-number "${PR_NUMBER}" \[0m
2025-12-04T09:25:47.7509631Z [36;1m  --tag "${TAG}" \[0m
2025-12-04T09:25:47.7509958Z [36;1m  --event-name "${EVENT_NAME}" \[0m
2025-12-04T09:25:47.7510286Z [36;1m  --schedule "${SCHEDULE}" \[0m
2025-12-04T09:25:47.7510611Z [36;1m  --branch "${HEAD_BRANCH}"[0m
2025-12-04T09:25:47.7519707Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:47.7520079Z env:
2025-12-04T09:25:47.7520299Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:47.7520559Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:47.7520870Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:47.7521556Z   GITHUB_TOKEN: ***
2025-12-04T09:25:47.7522248Z   JOB_NAME: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:25:47.7522985Z   PR_NUMBER: 
2025-12-04T09:25:47.7523206Z   TAG: 
2025-12-04T09:25:47.7523401Z   EVENT_NAME: schedule
2025-12-04T09:25:47.7523638Z   SCHEDULE: 29 8 * * *
2025-12-04T09:25:47.7523864Z   HEAD_BRANCH: main
2025-12-04T09:25:47.7524088Z ##[endgroup]
2025-12-04T09:25:47.7553295Z Workflow: periodic
2025-12-04T09:25:47.7554006Z Job name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:25:47.9492466Z Setting output keep-going=True
2025-12-04T09:25:47.9492810Z Setting output ci-verbose-test-logs=False
2025-12-04T09:25:47.9493159Z Setting output ci-test-showlocals=False
2025-12-04T09:25:47.9493481Z Setting output ci-no-test-timeout=False
2025-12-04T09:25:47.9493796Z Setting output ci-no-td=False
2025-12-04T09:25:47.9494356Z Setting output ci-td-distributed=False
2025-12-04T09:25:47.9494682Z Setting output is-unstable=False
2025-12-04T09:25:47.9494968Z Setting output reenabled-issues=
2025-12-04T09:25:47.9516386Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:25:47.9537782Z Setting output is-test-matrix-empty=False
2025-12-04T09:25:47.9618447Z ##[group]Run echo "Filtered matrix:"
2025-12-04T09:25:47.9618809Z [36;1mecho "Filtered matrix:"[0m
2025-12-04T09:25:47.9639844Z [36;1mecho "{"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}"[0m
2025-12-04T09:25:47.9660953Z [36;1m[0m
2025-12-04T09:25:47.9661148Z [36;1mecho[0m
2025-12-04T09:25:47.9661416Z [36;1mecho "Is the current job unstable? False"[0m
2025-12-04T09:25:47.9661735Z [36;1m[0m
2025-12-04T09:25:47.9661934Z [36;1mecho[0m
2025-12-04T09:25:47.9662182Z [36;1mecho "Is keep-going label set? True"[0m
2025-12-04T09:25:47.9662485Z [36;1m[0m
2025-12-04T09:25:47.9662683Z [36;1mecho[0m
2025-12-04T09:25:47.9662908Z [36;1mecho "Reenabled issues? "[0m
2025-12-04T09:25:47.9672085Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:47.9672454Z env:
2025-12-04T09:25:47.9672664Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:47.9672915Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:47.9673225Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:47.9673590Z ##[endgroup]
2025-12-04T09:25:47.9704064Z Filtered matrix:
2025-12-04T09:25:47.9729599Z {include: [{config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}]}
2025-12-04T09:25:47.9750411Z 
2025-12-04T09:25:47.9750528Z Is the current job unstable? False
2025-12-04T09:25:47.9750742Z 
2025-12-04T09:25:47.9750848Z Is keep-going label set? True
2025-12-04T09:25:47.9751030Z 
2025-12-04T09:25:47.9751122Z Reenabled issues? 
2025-12-04T09:25:47.9782628Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"
2025-12-04T09:25:47.9783150Z [36;1mecho "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:25:47.9791533Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:47.9791894Z env:
2025-12-04T09:25:47.9792109Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:47.9792366Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:47.9792665Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:47.9793023Z   JOB_TIMEOUT: 600
2025-12-04T09:25:47.9793248Z ##[endgroup]
2025-12-04T09:25:47.9847371Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:25:47.9847900Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:25:47.9848379Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:25:47.9856552Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:25:47.9856912Z env:
2025-12-04T09:25:47.9857123Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:47.9857391Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:47.9857696Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:47.9858056Z ##[endgroup]
2025-12-04T09:25:47.9979212Z ##[group]Run set -x
2025-12-04T09:25:47.9979519Z [36;1mset -x[0m
2025-12-04T09:25:47.9979736Z [36;1m[0m
2025-12-04T09:25:47.9979985Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2025-12-04T09:25:47.9980366Z [36;1m  TEST_COMMAND=.ci/pytorch/multigpu-test.sh[0m
2025-12-04T09:25:47.9980758Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2025-12-04T09:25:47.9981113Z [36;1m  TEST_COMMAND=.ci/onnx/test.sh[0m
2025-12-04T09:25:47.9981422Z [36;1melse[0m
2025-12-04T09:25:47.9981656Z [36;1m  TEST_COMMAND=.ci/pytorch/test.sh[0m
2025-12-04T09:25:47.9981957Z [36;1mfi[0m
2025-12-04T09:25:47.9982145Z [36;1m[0m
2025-12-04T09:25:47.9982392Z [36;1m# Leaving 1GB for the runner and other things[0m
2025-12-04T09:25:47.9982979Z [36;1mTOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo)[0m
2025-12-04T09:25:47.9983856Z [36;1m# https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap[0m
2025-12-04T09:25:47.9984567Z [36;1m# comes from https://github.com/pytorch/test-infra/pull/6058[0m
2025-12-04T09:25:47.9985106Z [36;1mTOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3))[0m
2025-12-04T09:25:47.9985525Z [36;1m[0m
2025-12-04T09:25:47.9985781Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:25:47.9986117Z [36;1m  SHM_OPTS=[0m
2025-12-04T09:25:47.9986536Z [36;1m  JENKINS_USER=[0m
2025-12-04T09:25:47.9986889Z [36;1m  # ensure that docker container cleanly exits in 12 hours[0m
2025-12-04T09:25:47.9987364Z [36;1m  # if for some reason cleanup action doesn't stop container[0m
2025-12-04T09:25:47.9987765Z [36;1m  # when job is cancelled[0m
2025-12-04T09:25:47.9988078Z [36;1m  DOCKER_SHELL_CMD="sleep 12h"[0m
2025-12-04T09:25:47.9988406Z [36;1m  USED_IMAGE="${DOCKER_IMAGE_S390X}"[0m
2025-12-04T09:25:47.9988721Z [36;1melse[0m
2025-12-04T09:25:47.9988966Z [36;1m  SHM_OPTS="--shm-size=${SHM_SIZE}"[0m
2025-12-04T09:25:47.9989308Z [36;1m  JENKINS_USER="--user jenkins"[0m
2025-12-04T09:25:47.9989615Z [36;1m  DOCKER_SHELL_CMD=[0m
2025-12-04T09:25:47.9989910Z [36;1m  USED_IMAGE="${DOCKER_IMAGE}"[0m
2025-12-04T09:25:47.9990204Z [36;1mfi[0m
2025-12-04T09:25:47.9990397Z [36;1m[0m
2025-12-04T09:25:47.9990741Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2025-12-04T09:25:47.9991291Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2025-12-04T09:25:47.9991912Z [36;1m# Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice[0m
2025-12-04T09:25:47.9992440Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2025-12-04T09:25:47.9992781Z [36;1mcontainer_name=$(docker run \[0m
2025-12-04T09:25:47.9993087Z [36;1m  ${GPU_FLAG:-} \[0m
2025-12-04T09:25:47.9993376Z [36;1m  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \[0m
2025-12-04T09:25:47.9993715Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2025-12-04T09:25:47.9994011Z [36;1m  -e PR_NUMBER \[0m
2025-12-04T09:25:47.9994274Z [36;1m  -e GITHUB_ACTIONS \[0m
2025-12-04T09:25:47.9994552Z [36;1m  -e GITHUB_REPOSITORY \[0m
2025-12-04T09:25:47.9994846Z [36;1m  -e GITHUB_WORKFLOW \[0m
2025-12-04T09:25:47.9995125Z [36;1m  -e GITHUB_JOB \[0m
2025-12-04T09:25:47.9995382Z [36;1m  -e GITHUB_RUN_ID \[0m
2025-12-04T09:25:47.9995684Z [36;1m  -e GITHUB_RUN_NUMBER \[0m
2025-12-04T09:25:47.9996000Z [36;1m  -e GITHUB_RUN_ATTEMPT \[0m
2025-12-04T09:25:47.9996279Z [36;1m  -e JOB_ID \[0m
2025-12-04T09:25:47.9996528Z [36;1m  -e JOB_NAME \[0m
2025-12-04T09:25:47.9996774Z [36;1m  -e BASE_SHA \[0m
2025-12-04T09:25:47.9997014Z [36;1m  -e BRANCH \[0m
2025-12-04T09:25:47.9997257Z [36;1m  -e SHA1 \[0m
2025-12-04T09:25:47.9997496Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2025-12-04T09:25:47.9997776Z [36;1m  -e IN_WHEEL_TEST \[0m
2025-12-04T09:25:47.9998041Z [36;1m  -e SHARD_NUMBER \[0m
2025-12-04T09:25:47.9998307Z [36;1m  -e TEST_CONFIG \[0m
2025-12-04T09:25:47.9998583Z [36;1m  -e NUM_TEST_SHARDS \[0m
2025-12-04T09:25:47.9998988Z [36;1m  -e REENABLED_ISSUES \[0m
2025-12-04T09:25:47.9999282Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2025-12-04T09:25:47.9999594Z [36;1m  -e VERBOSE_TEST_LOGS \[0m
2025-12-04T09:25:47.9999878Z [36;1m  -e TEST_SHOWLOCALS \[0m
2025-12-04T09:25:48.0000158Z [36;1m  -e NO_TEST_TIMEOUT \[0m
2025-12-04T09:25:48.0000430Z [36;1m  -e NO_TD \[0m
2025-12-04T09:25:48.0000670Z [36;1m  -e TD_DISTRIBUTED \[0m
2025-12-04T09:25:48.0000951Z [36;1m  -e PR_LABELS \[0m
2025-12-04T09:25:48.0001236Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2025-12-04T09:25:48.0001558Z [36;1m  -e SCCACHE_BUCKET \[0m
2025-12-04T09:25:48.0001845Z [36;1m  -e SCCACHE_REGION \[0m
2025-12-04T09:25:48.0002129Z [36;1m  -e XLA_CUDA \[0m
2025-12-04T09:25:48.0002424Z [36;1m  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \[0m
2025-12-04T09:25:48.0002781Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2025-12-04T09:25:48.0003155Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2025-12-04T09:25:48.0003534Z [36;1m  -e SKIP_SCCACHE_INITIALIZATION=1 \[0m
2025-12-04T09:25:48.0003873Z [36;1m  -e HUGGING_FACE_HUB_TOKEN \[0m
2025-12-04T09:25:48.0004207Z [36;1m  -e VLLM_TEST_HUGGING_FACE_TOKEN \[0m
2025-12-04T09:25:48.0004555Z [36;1m  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \[0m
2025-12-04T09:25:48.0004887Z [36;1m  -e DASHBOARD_TAG \[0m
2025-12-04T09:25:48.0005168Z [36;1m  -e ARTIFACTS_FILE_SUFFIX \[0m
2025-12-04T09:25:48.0005537Z [36;1m  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \[0m
2025-12-04T09:25:48.0006066Z [36;1m  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \[0m
2025-12-04T09:25:48.0006475Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T09:25:48.0006879Z [36;1m  --security-opt seccomp=unconfined \[0m
2025-12-04T09:25:48.0007221Z [36;1m  --cap-add=SYS_PTRACE \[0m
2025-12-04T09:25:48.0007508Z [36;1m  --ipc=host \[0m
2025-12-04T09:25:48.0007973Z [36;1m  ${SHM_OPTS} \[0m
2025-12-04T09:25:48.0008305Z [36;1m  --tty \[0m
2025-12-04T09:25:48.0008571Z [36;1m  --detach \[0m
2025-12-04T09:25:48.0008829Z [36;1m  --name="${container_name}" \[0m
2025-12-04T09:25:48.0009130Z [36;1m  ${JENKINS_USER} \[0m
2025-12-04T09:25:48.0009465Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2025-12-04T09:25:48.0009854Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2025-12-04T09:25:48.0010183Z [36;1m  "${USED_IMAGE}" \[0m
2025-12-04T09:25:48.0010467Z [36;1m  ${DOCKER_SHELL_CMD}[0m
2025-12-04T09:25:48.0010726Z [36;1m)[0m
2025-12-04T09:25:48.0011059Z [36;1mecho "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"[0m
2025-12-04T09:25:48.0011484Z [36;1m[0m
2025-12-04T09:25:48.0011749Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:25:48.0012332Z [36;1m  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt"[0m
2025-12-04T09:25:48.0012868Z [36;1mfi[0m
2025-12-04T09:25:48.0013090Z [36;1m[0m
2025-12-04T09:25:48.0013611Z [36;1mdocker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"[0m
2025-12-04T09:25:48.0022639Z shell: /usr/bin/bash -e {0}
2025-12-04T09:25:48.0022907Z env:
2025-12-04T09:25:48.0023125Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:25:48.0023397Z   HAS_NVIDIA_GPU: true
2025-12-04T09:25:48.0023698Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:25:48.0024187Z   BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T09:25:48.0024608Z   PR_NUMBER: 
2025-12-04T09:25:48.0024842Z   GITHUB_REPOSITORY: pytorch/pytorch
2025-12-04T09:25:48.0025150Z   GITHUB_WORKFLOW: periodic
2025-12-04T09:25:48.0025416Z   GITHUB_JOB: test
2025-12-04T09:25:48.0025645Z   GITHUB_RUN_ID: 19922826259
2025-12-04T09:25:48.0025915Z   GITHUB_RUN_NUMBER: 19107
2025-12-04T09:25:48.0026174Z   GITHUB_RUN_ATTEMPT: 1
2025-12-04T09:25:48.0026407Z   JOB_ID: 57118183212
2025-12-04T09:25:48.0027084Z   JOB_NAME: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:25:48.0027988Z   BRANCH: main
2025-12-04T09:25:48.0028247Z   SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:25:48.0028615Z   BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:25:48.0028958Z   TEST_CONFIG: default
2025-12-04T09:25:48.0029190Z   SHARD_NUMBER: 2
2025-12-04T09:25:48.0029400Z   NUM_TEST_SHARDS: 8
2025-12-04T09:25:48.0029625Z   EXTRA_FLAGS: 
2025-12-04T09:25:48.0029848Z   OP_BENCHMARK_TESTS: 
2025-12-04T09:25:48.0030077Z   REENABLED_ISSUES: 
2025-12-04T09:25:48.0030321Z   CONTINUE_THROUGH_ERROR: True
2025-12-04T09:25:48.0030596Z   VERBOSE_TEST_LOGS: False
2025-12-04T09:25:48.0030855Z   TEST_SHOWLOCALS: False
2025-12-04T09:25:48.0031101Z   NO_TEST_TIMEOUT: False
2025-12-04T09:25:48.0031358Z   NO_TD: False
2025-12-04T09:25:48.0031581Z   TD_DISTRIBUTED: False
2025-12-04T09:25:48.0031882Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2025-12-04T09:25:48.0032256Z   SCCACHE_REGION: us-east-1
2025-12-04T09:25:48.0032522Z   SHM_SIZE: 2g
2025-12-04T09:25:48.0033319Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:25:48.0034788Z   DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:25:48.0035682Z   XLA_CUDA: 
2025-12-04T09:25:48.0036169Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:25:48.0036628Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1
2025-12-04T09:25:48.0036953Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2025-12-04T09:25:48.0037265Z   DASHBOARD_TAG: 
2025-12-04T09:25:48.0037694Z   VLLM_TEST_HUGGING_FACE_TOKEN: ***
2025-12-04T09:25:48.0038111Z   HUGGING_FACE_HUB_TOKEN: ***
2025-12-04T09:25:48.0038528Z   SCRIBE_GRAPHQL_ACCESS_TOKEN: ***
2025-12-04T09:25:48.0039007Z   ARTIFACTS_FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T09:25:48.0039494Z ##[endgroup]
2025-12-04T09:25:48.0068226Z + [[ default == \m\u\l\t\i\g\p\u ]]
2025-12-04T09:25:48.0068661Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *onnx* ]]
2025-12-04T09:25:48.0069077Z + TEST_COMMAND=.ci/pytorch/test.sh
2025-12-04T09:25:48.0072557Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo
2025-12-04T09:25:48.0098761Z + TOTAL_AVAILABLE_MEMORY_IN_GB='61.094 '
2025-12-04T09:25:48.0099145Z + TOTAL_MEMORY_WITH_SWAP=64
2025-12-04T09:25:48.0099533Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *\s\3\9\0\x* ]]
2025-12-04T09:25:48.0099962Z + SHM_OPTS=--shm-size=2g
2025-12-04T09:25:48.0100234Z + JENKINS_USER='--user jenkins'
2025-12-04T09:25:48.0100487Z + DOCKER_SHELL_CMD=
2025-12-04T09:25:48.0101284Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:25:48.0109290Z +++ nproc --ignore=2
2025-12-04T09:25:48.0139545Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=61g --memory-swap=64g --env-file=/tmp/github_env_19922826259 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:25:56.8346219Z + container_name=5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T09:25:56.8347466Z + echo DOCKER_CONTAINER_ID=5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T09:25:56.8348301Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *\s\3\9\0\x* ]]
2025-12-04T09:25:56.8353175Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl
2025-12-04T09:25:56.8356768Z + docker exec -t 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh'
2025-12-04T09:25:57.3173283Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f)
2025-12-04T09:25:57.6412642Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0)
2025-12-04T09:25:57.6416309Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2)
2025-12-04T09:25:57.6421929Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3)
2025-12-04T09:25:57.6426377Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8)
2025-12-04T09:25:57.6429835Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6)
2025-12-04T09:25:57.6434231Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0)
2025-12-04T09:25:57.6446915Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0)
2025-12-04T09:25:57.6832438Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4)
2025-12-04T09:25:57.6851440Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0)
2025-12-04T09:25:57.6907981Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3)
2025-12-04T09:25:58.0659287Z Installing collected packages: torch
2025-12-04T09:26:09.5078058Z Successfully installed torch-2.10.0a0+gitffd9b0f
2025-12-04T09:26:09.5795537Z + export TERM=vt100
2025-12-04T09:26:09.5795780Z + TERM=vt100
2025-12-04T09:26:09.5799222Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:26:09.5811321Z + source .ci/pytorch/common.sh
2025-12-04T09:26:09.5815337Z +++ dirname .ci/pytorch/common.sh
2025-12-04T09:26:09.5824853Z ++ source .ci/pytorch/common_utils.sh
2025-12-04T09:26:09.5826214Z +++ declare -f -t trap_add
2025-12-04T09:26:09.5832261Z ++ set -ex -o pipefail
2025-12-04T09:26:09.5832689Z ++ [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *rocm* ]]
2025-12-04T09:26:09.5833230Z ++ BUILD_TEST_LIBTORCH=0
2025-12-04T09:26:09.5836341Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:26:09.6122587Z + source .ci/pytorch/common-build.sh
2025-12-04T09:26:09.6124508Z ++ [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *win-* ]]
2025-12-04T09:26:09.6131044Z ++++ dirname .ci/pytorch/common-build.sh
2025-12-04T09:26:09.6141474Z +++ cd .ci/pytorch
2025-12-04T09:26:09.6141819Z +++ pwd -P
2025-12-04T09:26:09.6213073Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch
2025-12-04T09:26:09.6213564Z ++ [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *-pch* ]]
2025-12-04T09:26:09.6213963Z ++ which sccache
2025-12-04T09:26:09.6280732Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]]
2025-12-04T09:26:09.6281102Z ++ sccache --stop-server
2025-12-04T09:26:09.6311704Z ++ true
2025-12-04T09:26:09.6311966Z ++ rm -f /var/lib/jenkins/sccache_error.log
2025-12-04T09:26:09.6322912Z ++ trap_add sccache_epilogue EXIT
2025-12-04T09:26:09.6323218Z ++ trap_add_cmd=sccache_epilogue
2025-12-04T09:26:09.6323590Z ++ shift
2025-12-04T09:26:09.6323795Z ++ for trap_add_name in "$@"
2025-12-04T09:26:09.6330698Z ++++ trap -p EXIT
2025-12-04T09:26:09.6334229Z +++ eval 'extract_trap_cmd '
2025-12-04T09:26:09.6334633Z ++++ extract_trap_cmd
2025-12-04T09:26:09.6334944Z ++++ printf '%s\n' ''
2025-12-04T09:26:09.6335276Z +++ printf '%s\n' sccache_epilogue
2025-12-04T09:26:09.6336826Z ++ trap -- '
2025-12-04T09:26:09.6337140Z sccache_epilogue' EXIT
2025-12-04T09:26:09.6337443Z ++ [[ -n 1 ]]
2025-12-04T09:26:09.6337934Z ++ echo 'Skipping sccache server initialization, setting environment variables'
2025-12-04T09:26:09.6338575Z Skipping sccache server initialization, setting environment variables
2025-12-04T09:26:09.6339408Z ++ export SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:26:09.6339688Z ++ SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:26:09.6340029Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:26:09.6340469Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:26:09.6345292Z ++ export RUST_LOG=sccache::server=error
2025-12-04T09:26:09.6345626Z ++ RUST_LOG=sccache::server=error
2025-12-04T09:26:09.6345912Z ++ sccache --zero-stats
2025-12-04T09:26:10.0159711Z Statistics zeroed.
2025-12-04T09:26:10.0168685Z ++ which ccache
2025-12-04T09:26:10.0249864Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *rocm* ]]
2025-12-04T09:26:10.0250418Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *s390x* ]]
2025-12-04T09:26:10.0250848Z + [[ -d /var/lib/jenkins/workspace ]]
2025-12-04T09:26:10.0252747Z ++ stat -c %u /var/lib/jenkins/workspace
2025-12-04T09:26:10.0269832Z + WORKSPACE_ORIGINAL_OWNER_ID=1000
2025-12-04T09:26:10.0270182Z + trap_add cleanup_workspace EXIT
2025-12-04T09:26:10.0270489Z + trap_add_cmd=cleanup_workspace
2025-12-04T09:26:10.0270742Z + shift
2025-12-04T09:26:10.0270941Z + for trap_add_name in "$@"
2025-12-04T09:26:10.0276976Z +++ trap -p EXIT
2025-12-04T09:26:10.0288312Z ++ eval 'extract_trap_cmd trap -- '\''
2025-12-04T09:26:10.0288711Z sccache_epilogue'\'' EXIT'
2025-12-04T09:26:10.0288983Z +++ extract_trap_cmd trap -- '
2025-12-04T09:26:10.0289258Z sccache_epilogue' EXIT
2025-12-04T09:26:10.0289497Z +++ printf '%s\n' '
2025-12-04T09:26:10.0289726Z sccache_epilogue'
2025-12-04T09:26:10.0289969Z ++ printf '%s\n' cleanup_workspace
2025-12-04T09:26:10.0290261Z + trap -- '
2025-12-04T09:26:10.0290459Z sccache_epilogue
2025-12-04T09:26:10.0290688Z cleanup_workspace' EXIT
2025-12-04T09:26:10.0290984Z + sudo chown -R jenkins /var/lib/jenkins/workspace
2025-12-04T09:26:11.0776364Z + git config --global --add safe.directory /var/lib/jenkins/workspace
2025-12-04T09:26:11.0798849Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]]
2025-12-04T09:26:11.0801844Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))'
2025-12-04T09:26:11.5242450Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:26:11.5243063Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']'
2025-12-04T09:26:11.5248265Z +++ realpath .ci/pytorch/test.sh
2025-12-04T09:26:11.5260980Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh
2025-12-04T09:26:11.5270353Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch
2025-12-04T09:26:11.5271431Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:26:11.5272028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace
2025-12-04T09:26:11.5272486Z + patch -p4
2025-12-04T09:26:11.5286650Z patching file cudadrv/driver.py
2025-12-04T09:26:11.5287845Z Hunk #1 succeeded at 357 (offset -8 lines).
2025-12-04T09:26:11.5359827Z + popd
2025-12-04T09:26:11.5360036Z ~/workspace
2025-12-04T09:26:11.5360268Z + echo 'Environment variables:'
2025-12-04T09:26:11.5360548Z Environment variables:
2025-12-04T09:26:11.5360782Z + env
2025-12-04T09:26:11.5370843Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:26:11.5371469Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:26:11.5372015Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T09:26:11.5372841Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:26:11.5373232Z HOSTNAME=5d0babf71ea3
2025-12-04T09:26:11.5373928Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.5374778Z GITHUB_ACTION=__run_3
2025-12-04T09:26:11.5375111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:26:11.5375494Z GITHUB_RUN_NUMBER=19107
2025-12-04T09:26:11.5375760Z TEST_CONFIG=default
2025-12-04T09:26:11.5376001Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:26:11.5376313Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:26:11.5376618Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:26:11.5377254Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:26:11.5377537Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:26:11.5377805Z GITHUB_REF_TYPE=branch
2025-12-04T09:26:11.5378089Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.5378407Z XLA_CUDA=
2025-12-04T09:26:11.5378631Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:26:11.5379023Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:26:11.5379688Z ***
2025-12-04T09:26:11.5379888Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:26:11.5380169Z GITHUB_ACTIONS=true
2025-12-04T09:26:11.5380417Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:26:11.5380742Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:26:11.5381128Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.5381499Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.5382038Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main
2025-12-04T09:26:11.5382563Z UCC_HOME=/usr
2025-12-04T09:26:11.5382775Z VERBOSE_TEST_LOGS=False
2025-12-04T09:26:11.5383017Z GITHUB_REF=refs/heads/main
2025-12-04T09:26:11.5383265Z SHARD_NUMBER=2
2025-12-04T09:26:11.5383480Z GITHUB_REF_PROTECTED=true
2025-12-04T09:26:11.5383726Z HOME=/var/lib/jenkins
2025-12-04T09:26:11.5384012Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:26:11.5384328Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:26:11.5384657Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:26:11.5384985Z USE_SYSTEM_NCCL=1
2025-12-04T09:26:11.5385198Z NUM_TEST_SHARDS=8
2025-12-04T09:26:11.5385404Z UCX_HOME=/usr
2025-12-04T09:26:11.5385971Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.5387063Z JOB_NAME=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:26:11.5388127Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.5388949Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:26:11.5389452Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:26:11.5389701Z DASHBOARD_TAG=
2025-12-04T09:26:11.5389910Z GITHUB_RUN_ID=19922826259
2025-12-04T09:26:11.5390154Z INSTALLED_OPENBLAS=
2025-12-04T09:26:11.5390771Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.5391506Z GITHUB_ACTOR=huydhn
2025-12-04T09:26:11.5391884Z PR_NUMBER=
2025-12-04T09:26:11.5392083Z DESIRED_CUDA=12.8.1
2025-12-04T09:26:11.5392303Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:26:11.5392536Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:26:11.5392854Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:26:11.5393186Z TERM=vt100
2025-12-04T09:26:11.5393377Z INSTALLED_VISION=yes
2025-12-04T09:26:11.5393598Z BRANCH=main
2025-12-04T09:26:11.5393809Z SCCACHE_REGION=us-east-1
2025-12-04T09:26:11.5394057Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:26:11.5394330Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:26:11.5394577Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:26:11.5395086Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:26:11.5395667Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:26:11.5396008Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:26:11.5396328Z REENABLED_ISSUES=
2025-12-04T09:26:11.5396532Z DOCS=
2025-12-04T09:26:11.5396713Z SHLVL=1
2025-12-04T09:26:11.5396894Z MAX_JOBS=14
2025-12-04T09:26:11.5397093Z GITHUB_ACTOR_ID=475357
2025-12-04T09:26:11.5397420Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.5397795Z GITHUB_REF_NAME=main
2025-12-04T09:26:11.5398153Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:26:11.5398576Z GITHUB_JOB=test
2025-12-04T09:26:11.5398791Z NO_TEST_TIMEOUT=False
2025-12-04T09:26:11.5399110Z TD_DISTRIBUTED=False
2025-12-04T09:26:11.5399360Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:26:11.5399654Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:26:11.5399895Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:26:11.5400155Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:26:11.5400939Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:26:11.5401744Z GITHUB_BASE_REF=
2025-12-04T09:26:11.5401958Z INSTALLED_ACL=
2025-12-04T09:26:11.5402358Z ARTIFACTS_FILE_SUFFIX=test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T09:26:11.5402824Z CI=true
2025-12-04T09:26:11.5403023Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:26:11.5403324Z RUST_LOG=sccache::server=error
2025-12-04T09:26:11.5403585Z JOB_ID=57118183212
2025-12-04T09:26:11.5403796Z GITHUB_HEAD_REF=
2025-12-04T09:26:11.5404012Z GITHUB_ACTION_REF=
2025-12-04T09:26:11.5404284Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:26:11.5404621Z TEST_SHOWLOCALS=False
2025-12-04T09:26:11.5404861Z GITHUB_WORKFLOW=periodic
2025-12-04T09:26:11.5405123Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:26:11.5405748Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.5406392Z NO_TD=False
2025-12-04T09:26:11.5406613Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:26:11.5406898Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:26:11.5407204Z _=/usr/bin/env
2025-12-04T09:26:11.5407554Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:26:11.5408437Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2025-12-04T09:26:11.5517737Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch
2025-12-04T09:26:11.5518543Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin
2025-12-04T09:26:11.5519343Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib
2025-12-04T09:26:11.5520047Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test
2025-12-04T09:26:11.5520515Z + BUILD_DIR=build
2025-12-04T09:26:11.5520781Z + BUILD_RENAMED_DIR=build_renamed
2025-12-04T09:26:11.5521173Z + BUILD_BIN_DIR=build/bin
2025-12-04T09:26:11.5521431Z + SHARD_NUMBER=2
2025-12-04T09:26:11.5521688Z + NUM_TEST_SHARDS=8
2025-12-04T09:26:11.5521932Z + export TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:26:11.5522264Z + TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:26:11.5522541Z + export VALGRIND=ON
2025-12-04T09:26:11.5522964Z + VALGRIND=ON
2025-12-04T09:26:11.5523301Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *clang9* ]]
2025-12-04T09:26:11.5523806Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *xpu* ]]
2025-12-04T09:26:11.5524194Z + detect_cuda_arch
2025-12-04T09:26:11.5524519Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]]
2025-12-04T09:26:11.5524910Z + command -v nvidia-smi
2025-12-04T09:26:11.5525150Z /usr/bin/nvidia-smi
2025-12-04T09:26:11.5528932Z ++ nvidia-smi --query-gpu=compute_cap --format=csv
2025-12-04T09:26:11.5529389Z ++ tail -n 1
2025-12-04T09:26:11.5809844Z + TORCH_CUDA_ARCH_LIST=8.6
2025-12-04T09:26:11.5810275Z + export TORCH_CUDA_ARCH_LIST
2025-12-04T09:26:11.5810676Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *s390x* ]]
2025-12-04T09:26:11.5811072Z + [[ 0 == \1 ]]
2025-12-04T09:26:11.5811276Z + [[ True == \1 ]]
2025-12-04T09:26:11.5811596Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *bazel* ]]
2025-12-04T09:26:11.5814808Z ++ realpath build/custom_test_artifacts
2025-12-04T09:26:11.6217458Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts
2025-12-04T09:26:11.6217971Z + [[ -n '' ]]
2025-12-04T09:26:11.6218193Z + echo 'Environment variables'
2025-12-04T09:26:11.6218462Z Environment variables
2025-12-04T09:26:11.6218689Z + env
2025-12-04T09:26:11.6372262Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:26:11.6373432Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:26:11.6374955Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T09:26:11.6376228Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:26:11.6377140Z HOSTNAME=5d0babf71ea3
2025-12-04T09:26:11.6377907Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.6378698Z GITHUB_ACTION=__run_3
2025-12-04T09:26:11.6378952Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:26:11.6379317Z GITHUB_RUN_NUMBER=19107
2025-12-04T09:26:11.6379738Z TEST_CONFIG=default
2025-12-04T09:26:11.6379980Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:26:11.6380287Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:26:11.6380590Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:26:11.6380996Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:26:11.6381277Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:26:11.6381535Z GITHUB_REF_TYPE=branch
2025-12-04T09:26:11.6381777Z TORCH_CUDA_ARCH_LIST=8.6
2025-12-04T09:26:11.6382063Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.6382459Z XLA_CUDA=
2025-12-04T09:26:11.6382762Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:26:11.6383509Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:26:11.6383916Z ***
2025-12-04T09:26:11.6384156Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:26:11.6384504Z GITHUB_ACTIONS=true
2025-12-04T09:26:11.6384808Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:26:11.6385241Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:26:11.6385767Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.6386268Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.6386881Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main
2025-12-04T09:26:11.6387461Z UCC_HOME=/usr
2025-12-04T09:26:11.6387687Z TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:26:11.6387940Z VERBOSE_TEST_LOGS=False
2025-12-04T09:26:11.6388180Z GITHUB_REF=refs/heads/main
2025-12-04T09:26:11.6388429Z SHARD_NUMBER=2
2025-12-04T09:26:11.6388637Z GITHUB_REF_PROTECTED=true
2025-12-04T09:26:11.6388888Z HOME=/var/lib/jenkins
2025-12-04T09:26:11.6389157Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:26:11.6389468Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:26:11.6389836Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:26:11.6390159Z USE_SYSTEM_NCCL=1
2025-12-04T09:26:11.6390375Z NUM_TEST_SHARDS=8
2025-12-04T09:26:11.6390582Z UCX_HOME=/usr
2025-12-04T09:26:11.6391361Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.6392516Z JOB_NAME=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T09:26:11.6393584Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.6394394Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:26:11.6394899Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:26:11.6395145Z DASHBOARD_TAG=
2025-12-04T09:26:11.6395357Z GITHUB_RUN_ID=19922826259
2025-12-04T09:26:11.6395594Z INSTALLED_OPENBLAS=
2025-12-04T09:26:11.6396205Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.6396886Z GITHUB_ACTOR=huydhn
2025-12-04T09:26:11.6397098Z PR_NUMBER=
2025-12-04T09:26:11.6397295Z DESIRED_CUDA=12.8.1
2025-12-04T09:26:11.6397519Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:26:11.6397730Z VALGRIND=ON
2025-12-04T09:26:11.6397942Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:26:11.6398269Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:26:11.6398595Z TERM=vt100
2025-12-04T09:26:11.6398798Z INSTALLED_VISION=yes
2025-12-04T09:26:11.6399012Z BRANCH=main
2025-12-04T09:26:11.6399212Z SCCACHE_REGION=us-east-1
2025-12-04T09:26:11.6399570Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:26:11.6399827Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:26:11.6400074Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:26:11.6400584Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:26:11.6401169Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:26:11.6401507Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:26:11.6401835Z REENABLED_ISSUES=
2025-12-04T09:26:11.6402036Z DOCS=
2025-12-04T09:26:11.6402211Z SHLVL=1
2025-12-04T09:26:11.6402388Z MAX_JOBS=14
2025-12-04T09:26:11.6402590Z GITHUB_ACTOR_ID=475357
2025-12-04T09:26:11.6402909Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:26:11.6403281Z GITHUB_REF_NAME=main
2025-12-04T09:26:11.6403643Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:26:11.6404051Z GITHUB_JOB=test
2025-12-04T09:26:11.6404269Z NO_TEST_TIMEOUT=False
2025-12-04T09:26:11.6404499Z TD_DISTRIBUTED=False
2025-12-04T09:26:11.6404750Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:26:11.6405035Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:26:11.6405283Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:26:11.6405536Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:26:11.6406311Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:26:11.6407120Z GITHUB_BASE_REF=
2025-12-04T09:26:11.6407333Z INSTALLED_ACL=
2025-12-04T09:26:11.6408011Z ARTIFACTS_FILE_SUFFIX=test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T09:26:11.6408520Z CI=true
2025-12-04T09:26:11.6408724Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:26:11.6409027Z RUST_LOG=sccache::server=error
2025-12-04T09:26:11.6409285Z JOB_ID=57118183212
2025-12-04T09:26:11.6409495Z GITHUB_HEAD_REF=
2025-12-04T09:26:11.6409695Z GITHUB_ACTION_REF=
2025-12-04T09:26:11.6409962Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:26:11.6410298Z TEST_SHOWLOCALS=False
2025-12-04T09:26:11.6410536Z GITHUB_WORKFLOW=periodic
2025-12-04T09:26:11.6410796Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:26:11.6411419Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_198bc67c-0846-46e5-96ef-ef7f70bb4eea
2025-12-04T09:26:11.6412051Z NO_TD=False
2025-12-04T09:26:11.6412252Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:26:11.6412541Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:26:11.6412985Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:26:11.6413395Z _=/usr/bin/env
2025-12-04T09:26:11.6413754Z + echo 'Testing pytorch'
2025-12-04T09:26:11.6414001Z Testing pytorch
2025-12-04T09:26:11.6414206Z + export LANG=C.UTF-8
2025-12-04T09:26:11.6414431Z + LANG=C.UTF-8
2025-12-04T09:26:11.6414630Z + PR_NUMBER=
2025-12-04T09:26:11.6414835Z + [[ default == \d\e\f\a\u\l\t ]]
2025-12-04T09:26:11.6415115Z + export CUDA_VISIBLE_DEVICES=0
2025-12-04T09:26:11.6415380Z + CUDA_VISIBLE_DEVICES=0
2025-12-04T09:26:11.6415630Z + export HIP_VISIBLE_DEVICES=0
2025-12-04T09:26:11.6415893Z + HIP_VISIBLE_DEVICES=0
2025-12-04T09:26:11.6416139Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]]
2025-12-04T09:26:11.6416433Z + [[ default == \s\l\o\w ]]
2025-12-04T09:26:11.6416822Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *slow-gradcheck* ]]
2025-12-04T09:26:11.6417290Z + export PYTORCH_TEST_WITH_SLOW_GRADCHECK=1
2025-12-04T09:26:11.6417619Z + PYTORCH_TEST_WITH_SLOW_GRADCHECK=1
2025-12-04T09:26:11.6417928Z + export PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:26:11.6418257Z + PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:26:11.6418647Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]]
2025-12-04T09:26:11.6419177Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:26:11.6419517Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:26:11.6419814Z + [[ default == *crossref* ]]
2025-12-04T09:26:11.6420175Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *rocm* ]]
2025-12-04T09:26:11.6420789Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *xpu* ]]
2025-12-04T09:26:11.6421292Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *-bazel-* ]]
2025-12-04T09:26:11.6421705Z + pip_install ninja==1.10.2
2025-12-04T09:26:11.6422051Z + pip_install_pkg='python3 -m pip install --progress-bar off'
2025-12-04T09:26:11.6422556Z + python3 -m pip install --progress-bar off ninja==1.10.2
2025-12-04T09:26:12.1251973Z Collecting ninja==1.10.2
2025-12-04T09:26:12.1500706Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB)
2025-12-04T09:26:12.1861461Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2025-12-04T09:26:12.5967232Z Installing collected packages: ninja
2025-12-04T09:26:12.5967564Z   Attempting uninstall: ninja
2025-12-04T09:26:12.5975027Z     Found existing installation: ninja 1.11.1.4
2025-12-04T09:26:12.5999143Z     Uninstalling ninja-1.11.1.4:
2025-12-04T09:26:12.6110508Z       Successfully uninstalled ninja-1.11.1.4
2025-12-04T09:26:12.6863148Z Successfully installed ninja-1.10.2
2025-12-04T09:26:12.7453711Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:26:12.7455404Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:26:12.7456643Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *aarch64* ]]
2025-12-04T09:26:12.7457177Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *asan* ]]
2025-12-04T09:26:12.7457689Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *-debug* ]]
2025-12-04T09:26:12.7458207Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *-bazel-* ]]
2025-12-04T09:26:12.7458891Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck. Expect the assertion to pass'
2025-12-04T09:26:12.7459829Z We are not in debug mode: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck. Expect the assertion to pass
2025-12-04T09:26:12.7460371Z + cd test
2025-12-04T09:26:12.7460702Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)'
2025-12-04T09:26:14.4177361Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2025-12-04T09:26:14.4177732Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2025-12-04T09:26:14.4178084Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]]
2025-12-04T09:26:14.4182310Z + DYNAMO_BENCHMARK_FLAGS=()
2025-12-04T09:26:14.4182998Z + [[ default == *pr_time_benchmarks* ]]
2025-12-04T09:26:14.4183359Z + [[ default == *dynamo_eager* ]]
2025-12-04T09:26:14.4183654Z + [[ default == *aot_eager* ]]
2025-12-04T09:26:14.4183927Z + [[ default == *aot_inductor* ]]
2025-12-04T09:26:14.4184223Z + [[ default == *max_autotune_inductor* ]]
2025-12-04T09:26:14.4184530Z + [[ default == *inductor* ]]
2025-12-04T09:26:14.4184789Z + [[ default == *dynamic* ]]
2025-12-04T09:26:14.4185061Z + [[ default == *cpu* ]]
2025-12-04T09:26:14.4185308Z + [[ default == *xpu* ]]
2025-12-04T09:26:14.4185579Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
2025-12-04T09:26:14.4217187Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *libtorch* ]]
2025-12-04T09:26:14.4217740Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *-bazel-* ]]
2025-12-04T09:26:14.4221114Z + cd test
2025-12-04T09:26:14.4221695Z + python -c 'import torch; print(torch.__config__.show())'
2025-12-04T09:26:16.0802988Z PyTorch built with:
2025-12-04T09:26:16.0803281Z   - GCC 11.4
2025-12-04T09:26:16.0803497Z   - C++ Version: 201703
2025-12-04T09:26:16.0804095Z   - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:26:16.0804823Z   - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:26:16.0805258Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:26:16.0805588Z   - LAPACK is enabled (usually provided by MKL)
2025-12-04T09:26:16.0806149Z   - NNPACK is enabled
2025-12-04T09:26:16.0806403Z   - CPU capability usage: AVX2
2025-12-04T09:26:16.0806684Z   - CUDA Runtime 12.8
2025-12-04T09:26:16.0807023Z   - NVCC architecture flags: -gencode;arch=compute_86,code=sm_86
2025-12-04T09:26:16.0807428Z   - CuDNN 91.0.2  (built against CUDA 12.9)
2025-12-04T09:26:16.0812770Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 
2025-12-04T09:26:16.0818068Z 
2025-12-04T09:26:16.4547183Z + cd test
2025-12-04T09:26:16.4547682Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2025-12-04T09:26:17.7750319Z ATen/Parallel:
2025-12-04T09:26:17.7750631Z 	at::get_num_threads() : 8
2025-12-04T09:26:17.7750913Z 	at::get_num_interop_threads() : 16
2025-12-04T09:26:17.7751220Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:26:17.7751507Z 	omp_get_max_threads() : 8
2025-12-04T09:26:17.7752097Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:26:17.7752690Z 	mkl_get_max_threads() : 8
2025-12-04T09:26:17.7753066Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:26:17.7753509Z std::thread::hardware_concurrency() : 16
2025-12-04T09:26:17.7753814Z Environment variables:
2025-12-04T09:26:17.7754411Z 	OMP_NUM_THREADS : [not set]
2025-12-04T09:26:17.7764440Z 	MKL_NUM_THREADS : [not set]
2025-12-04T09:26:17.7765119Z ATen parallel backend: OpenMP
2025-12-04T09:26:17.7765312Z 
2025-12-04T09:26:18.1021468Z + [[ default == *numpy_2* ]]
2025-12-04T09:26:18.1021947Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *aarch64* ]]
2025-12-04T09:26:18.1022370Z + [[ default == *backward* ]]
2025-12-04T09:26:18.1022680Z + [[ default == *libtorch_agnostic_targetting* ]]
2025-12-04T09:26:18.1023017Z + [[ default == *xla* ]]
2025-12-04T09:26:18.1023265Z + [[ default == *vllm* ]]
2025-12-04T09:26:18.1023537Z + [[ default == *executorch* ]]
2025-12-04T09:26:18.1023815Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]]
2025-12-04T09:26:18.1024125Z + [[ default == \q\u\a\n\t\i\z\a\t\i\o\n ]]
2025-12-04T09:26:18.1024567Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *libtorch* ]]
2025-12-04T09:26:18.1025024Z + [[ default == distributed ]]
2025-12-04T09:26:18.1025307Z + [[ default == *operator_benchmark* ]]
2025-12-04T09:26:18.1025626Z + [[ default == *operator_microbenchmark* ]]
2025-12-04T09:26:18.1025975Z + [[ default == *attention_microbenchmark* ]]
2025-12-04T09:26:18.1026321Z + [[ default == *inductor_distributed* ]]
2025-12-04T09:26:18.1026629Z + [[ default == *inductor-halide* ]]
2025-12-04T09:26:18.1026930Z + [[ default == *inductor-pallas* ]]
2025-12-04T09:26:18.1027344Z + [[ default == *inductor-triton-cpu* ]]
2025-12-04T09:26:18.1027761Z + [[ default == *inductor-micro-benchmark* ]]
2025-12-04T09:26:18.1028112Z + [[ default == *aoti_cross_compile_for_windows* ]]
2025-12-04T09:26:18.1028824Z + [[ default == *huggingface* ]]
2025-12-04T09:26:18.1029099Z + [[ default == *timm* ]]
2025-12-04T09:26:18.1029348Z + [[ default == cachebench ]]
2025-12-04T09:26:18.1029623Z + [[ default == verify_cachebench ]]
2025-12-04T09:26:18.1029915Z + [[ default == *torchbench* ]]
2025-12-04T09:26:18.1030193Z + [[ default == *inductor_cpp_wrapper* ]]
2025-12-04T09:26:18.1030509Z + [[ default == *inductor_core* ]]
2025-12-04T09:26:18.1030790Z + [[ default == *inductor* ]]
2025-12-04T09:26:18.1031047Z + [[ default == *einops* ]]
2025-12-04T09:26:18.1031311Z + [[ default == *dynamo_core* ]]
2025-12-04T09:26:18.1031591Z + [[ default == *dynamo_wrapped* ]]
2025-12-04T09:26:18.1031972Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *rocm* ]]
2025-12-04T09:26:18.1032363Z + [[ 2 == 1 ]]
2025-12-04T09:26:18.1032563Z + [[ 2 == 2 ]]
2025-12-04T09:26:18.1032779Z + [[ 8 -gt 1 ]]
2025-12-04T09:26:18.1032985Z + install_torchvision
2025-12-04T09:26:18.1033225Z + local orig_preload
2025-12-04T09:26:18.1033459Z + local commit
2025-12-04T09:26:18.1033669Z ++ get_pinned_commit vision
2025-12-04T09:26:18.1033942Z ++ cat .github/ci_commit_pins/vision.txt
2025-12-04T09:26:18.1047740Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:26:18.1048071Z + orig_preload=
2025-12-04T09:26:18.1048283Z + '[' -n '' ']'
2025-12-04T09:26:18.1048600Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]]
2025-12-04T09:26:18.1049000Z + export FORCE_CUDA=1
2025-12-04T09:26:18.1049224Z + FORCE_CUDA=1
2025-12-04T09:26:18.1049440Z + export WITH_CUDA=1
2025-12-04T09:26:18.1049662Z + WITH_CUDA=1
2025-12-04T09:26:18.1050222Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision
2025-12-04T09:26:18.1051115Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:26:18.1051682Z + local wheel_dir=dist/vision
2025-12-04T09:26:18.1051940Z + local found_whl=0
2025-12-04T09:26:18.1052184Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:26:18.1052480Z + [[ -f dist/vision/*.whl ]]
2025-12-04T09:26:18.1052727Z + '[' 0 == 0 ']'
2025-12-04T09:26:18.1053402Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:26:18.4295568Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:26:18.4299528Z   Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-p9mt7q5u
2025-12-04T09:26:18.4473952Z   Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-p9mt7q5u
2025-12-04T09:26:20.2135383Z   Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e'
2025-12-04T09:26:20.2160387Z   Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:26:20.3515998Z   Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:26:22.4736793Z   Preparing metadata (pyproject.toml) ... [?25l- \ | done
2025-12-04T09:26:22.4769030Z [?25hBuilding wheels for collected packages: torchvision
2025-12-04T09:27:38.0803062Z   Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done
2025-12-04T09:27:38.0834177Z [?25h  Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1786563 sha256=7874054a75ed282a987b4c93ab0d0596d77e962e2e31afef205dc3d79b7b2778
2025-12-04T09:27:38.0837721Z   Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac
2025-12-04T09:27:38.0873283Z Successfully built torchvision
2025-12-04T09:27:38.1980254Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:27:38.1980812Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:27:38.1981494Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl')
2025-12-04T09:27:38.1981941Z + local args
2025-12-04T09:27:38.1982328Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]]
2025-12-04T09:27:38.1982816Z + for path in "${args[@]}"
2025-12-04T09:27:38.1983297Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl'
2025-12-04T09:27:38.1983991Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:27:38.1984787Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:27:38.5300094Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:27:38.5396402Z Installing collected packages: torchvision
2025-12-04T09:27:39.0106028Z Successfully installed torchvision-0.25.0a0+617079d
2025-12-04T09:27:39.0499352Z + '[' -n '' ']'
2025-12-04T09:27:39.0499628Z + test_python_shard 2
2025-12-04T09:27:39.0499936Z + [[ -z 8 ]]
2025-12-04T09:27:39.0500774Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 2 8 --verbose --upload-artifacts-while-running
2025-12-04T09:27:42.1611240Z Excluding doctests Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1615285Z Excluding test_meta Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1615996Z Excluding test_hub Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1616689Z Excluding test_fx Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1617400Z Excluding test_decomp Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1618213Z Excluding test_cpp_extensions_jit Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1619085Z Excluding test_jit Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1619829Z Excluding test_matmul_cuda Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1620565Z Excluding test_ops Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1621566Z Excluding test_ops_jit Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1622360Z Excluding dynamo/test_recompile_ux Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1623262Z Excluding inductor/test_compiled_optimizers Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1624188Z Excluding inductor/test_cutlass_backend Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1625077Z Excluding inductor/test_max_autotune Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1625975Z Excluding inductor/test_select_algorithm Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:42.1626840Z Excluding inductor/test_smoke Running in slow gradcheck mode, skipping tests that don't use gradcheck.
2025-12-04T09:27:44.1255196Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2025-12-04T09:27:44.1828264Z Ignoring disabled issues:  ['']
2025-12-04T09:27:44.1929178Z Found test times from artifacts
2025-12-04T09:27:44.2331977Z Found test times from artifacts
2025-12-04T09:27:44.2345562Z Running all tests
2025-12-04T09:27:44.2978926Z Running parallel tests on 1 processes
2025-12-04T09:27:44.2985486Z Name: tests to run (est. time: 140.51min)
2025-12-04T09:27:44.2986159Z   Serial tests (80):
2025-12-04T09:27:44.2986404Z     inductor/test_aot_inductor 2/5
2025-12-04T09:27:44.2986763Z     inductor/test_torchinductor_codegen_dynamic_shapes 1/4
2025-12-04T09:27:44.2987158Z     inductor/test_torchinductor_opinfo 4/14
2025-12-04T09:27:44.2987493Z     inductor/test_torchinductor_opinfo 12/14
2025-12-04T09:27:44.2987817Z     inductor/test_flex_attention 6/6
2025-12-04T09:27:44.2988105Z     inductor/test_fp8 1/1
2025-12-04T09:27:44.2988364Z     dynamo/test_model_output 1/1
2025-12-04T09:27:44.2988644Z     inductor/test_triton_kernels 1/1
2025-12-04T09:27:44.2988953Z     inductor/test_loop_ordering 1/1
2025-12-04T09:27:44.2989241Z     export/test_serdes 1/1
2025-12-04T09:27:44.2989526Z     inductor/test_scatter_optimization 1/1
2025-12-04T09:27:44.2989841Z     inductor/test_padding 1/1
2025-12-04T09:27:44.2990114Z     dynamo/test_callback 1/1
2025-12-04T09:27:44.2990386Z     inductor/test_custom_op_autotune 1/1
2025-12-04T09:27:44.2990679Z     test_cuda 1/1
2025-12-04T09:27:44.2990896Z     test_sparse 1/1
2025-12-04T09:27:44.2991150Z     test_ci_sanity_check_fail 1/1
2025-12-04T09:27:44.2991525Z     test_ops_fwd_gradients 6/12
2025-12-04T09:27:44.2991793Z     test_ops_gradients 2/10
2025-12-04T09:27:44.2992102Z     test_ops_gradients 10/10
2025-12-04T09:27:44.2992401Z     functorch/test_ops 3/6
2025-12-04T09:27:44.2992654Z     dynamo/test_after_aot 1/1
2025-12-04T09:27:44.2992970Z     inductor/test_snode_runtime 1/1
2025-12-04T09:27:44.2993395Z     inductor/test_compiled_autograd 1/2
2025-12-04T09:27:44.2993736Z     test_testing 1/1
2025-12-04T09:27:44.2993986Z     inductor/test_autoheuristic 1/1
2025-12-04T09:27:44.2994287Z     inductor/test_cutedsl_template 1/1
2025-12-04T09:27:44.2994593Z     inductor/test_benchmark_fusion 1/1
2025-12-04T09:27:44.2994886Z     inductor/test_remote_cache 1/1
2025-12-04T09:27:44.2995194Z     inductor/test_coordinate_descent_tuner 1/1
2025-12-04T09:27:44.2995621Z     inductor/test_inplace_padding 1/1
2025-12-04T09:27:44.2995921Z     inductor/test_cudacodecache 1/1
2025-12-04T09:27:44.2996301Z     inductor/test_minifier_utils 1/1
2025-12-04T09:27:44.2996596Z     inductor/test_debug_trace 1/1
2025-12-04T09:27:44.2996929Z     export/test_tree_utils 1/1
2025-12-04T09:27:44.2997307Z     inductor/test_triton_wrapper 1/1
2025-12-04T09:27:44.2997640Z     inductor/test_static_cuda_launcher 1/1
2025-12-04T09:27:44.2997964Z     inductor/test_provenance_tracing 1/1
2025-12-04T09:27:44.2998277Z     inductor/test_memory_planning 1/1
2025-12-04T09:27:44.2998581Z     export/test_cpp_serdes 1/1
2025-12-04T09:27:44.2998853Z     inductor/test_control_flow 2/4
2025-12-04T09:27:44.2999324Z     test_sort_and_select 1/1
2025-12-04T09:27:44.2999598Z     functorch/test_rearrange 1/1
2025-12-04T09:27:44.2999908Z     test_package 1/1
2025-12-04T09:27:44.3000150Z     test_mkl_verbose 1/1
2025-12-04T09:27:44.3000400Z     test_utils_config_module 1/1
2025-12-04T09:27:44.3000663Z     test_hop_infra 1/1
2025-12-04T09:27:44.3000915Z     test_appending_byte_serializer 1/1
2025-12-04T09:27:44.3001215Z     test_ao_sparsity 1/1
2025-12-04T09:27:44.3001462Z     test_extension_utils 1/1
2025-12-04T09:27:44.3001723Z     nn/attention/test_fa4 1/1
2025-12-04T09:27:44.3001991Z     typing/test_python_operators 1/1
2025-12-04T09:27:44.3002281Z     torch_np/test_dtype 1/1
2025-12-04T09:27:44.3002531Z     test_file_check 1/1
2025-12-04T09:27:44.3002767Z     profiler/test_kineto 1/1
2025-12-04T09:27:44.3003034Z     functorch/test_ac_knapsack 1/1
2025-12-04T09:27:44.3003330Z     torch_np/test_nep50_examples 1/1
2025-12-04T09:27:44.3003598Z     test_torch 1/1
2025-12-04T09:27:44.3003824Z     xpu/test_gemm 1/1
2025-12-04T09:27:44.3004060Z     test_binary_ufuncs 1/1
2025-12-04T09:27:44.3004297Z     test_modules 2/4
2025-12-04T09:27:44.3004560Z     torch_np/numpy_tests/linalg/test_linalg 1/1
2025-12-04T09:27:44.3004906Z     torch_np/numpy_tests/core/test_dtype 1/1
2025-12-04T09:27:44.3005208Z     lazy/test_debug_util 1/1
2025-12-04T09:27:44.3005467Z     nn/test_load_state_dict 1/1
2025-12-04T09:27:44.3005723Z     test_shape_ops 1/1
2025-12-04T09:27:44.3006065Z     profiler/test_memory_profiler 1/1
2025-12-04T09:27:44.3006344Z     test_indexing 1/1
2025-12-04T09:27:44.3006568Z     test_type_info 1/1
2025-12-04T09:27:44.3006812Z     functorch/test_aotdispatch 1/1
2025-12-04T09:27:44.3007092Z     test_scatter_gather_ops 1/1
2025-12-04T09:27:44.3007362Z     test_cuda_multigpu 1/1
2025-12-04T09:27:44.3007652Z     torch_np/numpy_tests/lib/test_index_tricks 1/1
2025-12-04T09:27:44.3008302Z     test_jit_autocast 1/1
2025-12-04T09:27:44.3008558Z     test_xnnpack_integration 1/1
2025-12-04T09:27:44.3008828Z     nn/test_init 1/1
2025-12-04T09:27:44.3009054Z     test_mobile_optimizer 1/1
2025-12-04T09:27:44.3009324Z     test_type_promotion 1/1
2025-12-04T09:27:44.3009587Z     test_reductions 1/1
2025-12-04T09:27:44.3009859Z     test_autoload_disable 1/1
2025-12-04T09:27:44.3010117Z   Parallel tests (0):
2025-12-04T09:27:44.3010362Z Name: excluded (est. time: 0.0min)
2025-12-04T09:27:44.3010627Z   Serial tests (0):
2025-12-04T09:27:44.3010855Z   Parallel tests (0):
2025-12-04T09:27:44.3011258Z Running inductor/test_aot_inductor 2/5 ... [2025-12-04 09:27:44.299075][936.308299477]
2025-12-04T09:27:44.3011734Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:27:44.3012791Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=2', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:27:44.299499]
2025-12-04T09:36:10.8931079Z 
2025-12-04T09:36:10.8941419Z inductor/test_aot_inductor 2/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_2.5_ac1d7e2a37fbed81_.log
2025-12-04T09:36:10.9035970Z Running 184 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_3_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_4_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_composed_dynamic_size_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_disable_one_pass_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_multiple_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_convolution_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dynamic_scalar_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_inf_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_dynamic_dim_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_no_args_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_permute_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_interleave_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_large_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_dot_product_efficient_attention_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_False_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_from_multi_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_and_mul_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_sympy_fn_like_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aliased_buffer_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_fp8_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_int64_user_defined_triton_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bool_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_type_propagation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dup_unbacked_sym_decl_with_refinement_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_scalar_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_embedding_bag_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_index_put_with_none_index_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quanatized_int8_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replicate_on_devices_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_shifted_constraint_ranges_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_True_max_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_so_without_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_dynamic_launcher_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_with_none_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_view_outputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_cudagraphs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_name_collision_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_user_defined_triton_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_profiler_enable_kernel_profile_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_int64_user_defined_triton_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_non_tensor_predicates_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_predicate_on_cpu_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_share_predicate_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_outer_code_before_after_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_copy_non_blocking_is_pinned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dup_unbacked_sym_decl_with_refinement_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_dynamic_dim_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_on_disk_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misaligned_input_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_cubin_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_multiple_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_default_gpu_device_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_tensor_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_shape_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_True_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_so_without_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_fn_like_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_weird_param_order_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps
2025-12-04T09:36:10.9129617Z 
2025-12-04T09:36:10.9129927Z Finished inductor/test_aot_inductor 2/5 ... [2025-12-04 09:36:10.892539][1442.901763277], took 8.44min
2025-12-04T09:36:10.9131055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d2163ec8f4306bf7.xml
2025-12-04T09:36:11.3136341Z Uploading artifacts took 0.11 seconds
2025-12-04T09:36:11.3139699Z Running inductor/test_torchinductor_codegen_dynamic_shapes 1/4 ... [2025-12-04 09:36:11.313643][1443.322864938]
2025-12-04T09:36:11.3140310Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:36:11.3144405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:36:11.314078]
2025-12-04T09:44:42.2450687Z 
2025-12-04T09:44:42.2452258Z inductor/test_torchinductor_codegen_dynamic_shapes 1/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_1.4_295ecc74e041d7f8_.log
2025-12-04T09:44:42.2727071Z Running 440 items in this shard: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_matmul_4bit_bf16_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex7_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_const_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_on_views_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_alignment_op_name_fail_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_alignment_op_name_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_negative_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_empty_1d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_upcasting_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_compar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_config_option_dont_assume_alignment_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_const_int32_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_functional_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_tensor_with_cpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_tensor_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cudnn_rnn_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_data_type_propogation_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_device_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div9_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_embedding_bag_byte_unpack_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_embedding_bag_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_expand_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_list_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fft_real_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmin_fmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_full_like_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_functionalize_rng_wrappers_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gather2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_generated_code_has_alignment_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gpu_scalar_with_cpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_failed_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inner_fn_str_and_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_int8_weight_only_quant_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_invalid_operand_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lgamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_like_rands_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linspace3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linspace4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_long_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_masked_fill_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mean_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_misaligned_address_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mix_device_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_move_arange_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_gpu_recompile_on_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_threading_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_to_num_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_no_op_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_no_specization_over_symbolic_value_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_permute1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_philox_rand_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pixel_shuffle_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_airy_ai_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_y1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_entr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_log_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtri_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_scaled_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_t_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_polar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_prod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_int64_mod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reinterpret_dtypeview_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_view_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_replication_pad_errors_with_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_require_stride_expanded_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_roll_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_cpu_tensor_arg_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_add2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_select_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_shape_padding_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_signbit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_mutation3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_transpose_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumprod_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_failed_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_reduction_with_int64_size_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_list_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_stack_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_std_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_strided_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_tensor_index_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_topk_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triton_kernel_bool_param_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint4x2_mixed_mm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unsigned_constant_tensors_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_vdd_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_vectorized_ops_masked_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_view_on_aliased_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_const_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_const_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_addmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aliased_buffer_reuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_allow_reuse_active_if_under_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_override_registration_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_as_strided_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_alignment_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_batch_norm_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bernoulli2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_computed_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int16_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_uint8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_copied_in_graph_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_empty_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_upcasting_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_compar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_consecutive_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_const_int32_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_fill_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv1d_depthwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_functional_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_convolution3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_with_scalar_src_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cudnn_rnn_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_no_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_default_layout_constraint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_data_type_propogation_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_device_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div_precision_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dont_constant_fold_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_bfloat16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_bag_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exact_stride_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expand_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expand_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fill2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_flip_cat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float32_to_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fmod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fractional_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_like_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_gather1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_gather_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_glu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_constant_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_mutation_real_name_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_pad_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_grid_sampler_expand_preserves_view_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_float_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_indirect_load_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inductor_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_where_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_insignificant_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int8_weight_only_quant_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_offset_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_leaky_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linalg_eig_stride_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear_dynamic_maxautotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linspace1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linspace4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_mode_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_mode_not_decompose_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log_fp64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_logsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_masked_fill_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_min_max_reduction_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_misaligned_address_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mixed_mm2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_move_arange_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_prime_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_True_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_neg_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_new_ones_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_no_op_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nonzero_unbacked_refinement_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pad_view_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pattern_matcher_unbacked_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_philox_rand_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_bessel_y1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_digamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_entr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_i0e_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_legendre_polynomial_p_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_ndtr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_ndtri_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_profiler_mark_wrapper_call_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_distribution_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_int64_mod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_kernel_count_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scaled_dot_product_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_searchsorted_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_signbit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_simplify_loops_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_view_with_graph_break_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_one_kernel_loop_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_special_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_reduction_with_int64_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_with_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_stack_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_std_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_device_constant_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_transposed_propagates_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unbacked_floordiv_simplify_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsigned_constant_tensors_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bilinear2d_b_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_var_mean_div_by_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_var_mean_tile_reduction_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vdd_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_weight_norm_conv2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_where_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zero_element_mutation_dynamic_shapes_cuda
2025-12-04T09:44:42.2994833Z 
2025-12-04T09:44:42.2995289Z Finished inductor/test_torchinductor_codegen_dynamic_shapes 1/4 ... [2025-12-04 09:44:42.245756][1954.254979915], took 8.52min
2025-12-04T09:44:42.2996757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-7dfb99a0e36ebc6b.xml
2025-12-04T09:44:42.3346945Z Running inductor/test_torchinductor_opinfo 4/14 ... [2025-12-04 09:44:42.334287][1954.343507072]
2025-12-04T09:44:42.3347516Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:44:42.3350251Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=4', '--num-shards=14', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:44:42.334645]
2025-12-04T09:55:07.4875911Z 
2025-12-04T09:55:07.4879303Z inductor/test_torchinductor_opinfo 4/14 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_4.14_2b71ae42f7581618_.log
2025-12-04T09:55:07.5058708Z Running 246 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_multinomial_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_bool
2025-12-04T09:55:07.5213908Z 
2025-12-04T09:55:07.5214280Z Finished inductor/test_torchinductor_opinfo 4/14 ... [2025-12-04 09:55:07.488269][2579.497491388], took 10.42min
2025-12-04T09:55:07.5215600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f45bd9366a90530e.xml
2025-12-04T09:55:07.5735987Z Running inductor/test_torchinductor_opinfo 12/14 ... [2025-12-04 09:55:07.573132][2579.582351573]
2025-12-04T09:55:07.5736738Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:55:07.5740027Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=12', '--num-shards=14', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:55:07.573533]
2025-12-04T10:05:26.2498727Z 
2025-12-04T10:05:26.2499852Z inductor/test_torchinductor_opinfo 12/14 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.14_f1debdb3c47cb0ae_.log
2025-12-04T10:05:26.2638404Z Running 257 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cauchy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_einsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int32
2025-12-04T10:05:26.2770553Z 
2025-12-04T10:05:26.2770915Z Finished inductor/test_torchinductor_opinfo 12/14 ... [2025-12-04 10:05:26.250225][3198.259448465], took 10.31min
2025-12-04T10:05:26.2772344Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-85306c1f70284b1c.xml
2025-12-04T10:05:26.5353596Z Uploading artifacts took 0.20 seconds
2025-12-04T10:05:26.5357542Z Running inductor/test_flex_attention 6/6 ... [2025-12-04 10:05:26.535409][3198.544632294]
2025-12-04T10:05:26.5358041Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:05:26.5361891Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_attention.py', '--shard-id=6', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:05:26.535751]
2025-12-04T10:15:31.7161279Z 
2025-12-04T10:15:31.7162504Z inductor/test_flex_attention 6/6 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_attention_6.6_cafbaa2a62098057_.log
2025-12-04T10:15:31.7242586Z Running 141 items in this shard: test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod3_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod4_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod5_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_autograd_function_in_score_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_backend_triton_decode_errors_with_non_power_of_two_gqa_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_cant_lower_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_buffers_all_dims_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_wrong_device_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_custom_score_mod_layout_freeze_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dependent_causal_bidirectional_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_0_check_compile_False_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_index_weird1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_from_view_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_only_return_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_flex_decode_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_max_autotune_with_captured_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mixed_device_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mixed_dtypes_fails_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_score_mod_calls2_paged_attention_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_new_empty_mask_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux__rel_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux__times_two_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux_deprecation_warnings_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_max__causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_ops_to_save0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_with_max_autotune_short_query_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_skip_odd_keys_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_small_block_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_tma_with_customer_kernel_options_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_validate_small_embedding_size_error_message_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_doc_mask_clamped_repro_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_forward_pass_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_from_kv_blocks_without_q_computation_full_indices_False_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_getitem_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_init_mismatched_full_q_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_upcast_appropriately_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_comparison_vs_sdpa_with_learnable_bias_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda
2025-12-04T10:15:31.7319650Z 
2025-12-04T10:15:31.7319977Z Finished inductor/test_flex_attention 6/6 ... [2025-12-04 10:15:31.715727][3803.724951519], took 10.09min
2025-12-04T10:15:31.7321147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-e8dc2e2d2922989b.xml
2025-12-04T10:15:31.8331962Z Running inductor/test_fp8 1/1 ... [2025-12-04 10:15:31.832810][3803.842032892]
2025-12-04T10:15:31.8332411Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:15:31.8338856Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:31.833158]
2025-12-04T10:35:19.5664334Z 
2025-12-04T10:35:19.5665486Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_440b1865b73f9802_.log)
2025-12-04T10:35:19.5667019Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml
2025-12-04T10:35:19.5668577Z ============================= test session starts ==============================
2025-12-04T10:35:19.5669359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.5670041Z cachedir: .pytest_cache
2025-12-04T10:35:19.5670897Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.5671763Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.5672152Z configfile: pytest.ini
2025-12-04T10:35:19.5672904Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.5673794Z collecting ... collected 188 items
2025-12-04T10:35:19.5674273Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:35:19.5816004Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda
2025-12-04T10:35:19.5931874Z 
2025-12-04T10:35:19.5933139Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.5935707Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.5937014Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.5937874Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.5938982Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.5940047Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.5941018Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.5942247Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.5943410Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.5944522Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.5945918Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.5946998Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.5947934Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.5948981Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.5949971Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.5950857Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.5952152Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.5953353Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.5954438Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.5955501Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.5956536Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.5957632Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.5958776Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.5959868Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.5960812Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.5961721Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.5962714Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.5963693Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.5964671Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.5965798Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.5966973Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.5968432Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.5969562Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.5971898Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.5974911Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.5976613Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.5978562Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.5980614Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.5982518Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.5984407Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.5986202Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.5987665Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.5989577Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.5991150Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.5992683Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.5993730Z ('RERUN', {'yellow': True}) [1.6928s] [  0%]
2025-12-04T10:35:19.5995304Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.5997692Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.5998991Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.5999995Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6001212Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6002431Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6003684Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6004981Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6006134Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6007309Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6008912Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6009903Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6011098Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6012503Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6013729Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6014886Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6016335Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6017657Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6018739Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6019914Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6021277Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6022693Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6024189Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6025500Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6026735Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6027935Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6029152Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6030219Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6031698Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6032959Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6034214Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6035841Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6037163Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6039798Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6042466Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6044158Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6045756Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6047208Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6048781Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6050234Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6051896Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6053271Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6054722Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6056013Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6057216Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6058227Z ('RERUN', {'yellow': True}) [0.2632s] [  0%]
2025-12-04T10:35:19.6059797Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6061775Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6063063Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6063918Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6064855Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6065807Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6066777Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6067809Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6069099Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6070374Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6071460Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6103655Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6104755Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6106043Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6107214Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6108551Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6109913Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6111350Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6112648Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6113962Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6115296Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6116744Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6118193Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6119572Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6120810Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6122270Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6123535Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6124770Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6126027Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6127393Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6128745Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6130320Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6131627Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6134449Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6139286Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6141200Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6143086Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6144895Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6146654Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6148401Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6150222Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6151838Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6153605Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6155218Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6156898Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6158125Z FAILED [0.2612s] [  0%]
2025-12-04T10:35:19.6158326Z 
2025-12-04T10:35:19.6158492Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.6159191Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6159726Z Traceback (most recent call last):
2025-12-04T10:35:19.6160348Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6161181Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6162090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6163057Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6164090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6165031Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6165799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6166817Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6167734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6172808Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6173930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6174858Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6175684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6176514Z     return self._compile_to_module()
2025-12-04T10:35:19.6177338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6178129Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6179154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6179959Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6180840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6181847Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6182960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6183889Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6184734Z   File "/tmp/tmpzcx134wn/6y/c6yly3762l5dpq4zpvhebgpc534lzyzc6pj2sy42ui4qzgnlb6o4.py", line 62, in <module>
2025-12-04T10:35:19.6186028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6186843Z     kernel.precompile(
2025-12-04T10:35:19.6187700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6188641Z     self._precompile_worker()
2025-12-04T10:35:19.6189511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6190375Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6191624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6192696Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6193494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6194383Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6195326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6196270Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6197061Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6198040Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6198858Z ^
2025-12-04T10:35:19.6199516Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6200178Z 
2025-12-04T10:35:19.6200985Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6201978Z 
2025-12-04T10:35:19.6201984Z 
2025-12-04T10:35:19.6202374Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6203645Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6204672Z 
2025-12-04T10:35:19.6204987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6205702Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6206229Z frames [('total', 1)]
2025-12-04T10:35:19.6206578Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6207141Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6208010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6208578Z graph_break []
2025-12-04T10:35:19.6209109Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6209774Z Traceback (most recent call last):
2025-12-04T10:35:19.6210544Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6211460Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6212413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6213401Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6214388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6215350Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6216292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6217270Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6218415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6219919Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6221129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6222075Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6223310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6224390Z     return self._compile_to_module()
2025-12-04T10:35:19.6225365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6226452Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6227534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6228617Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6229536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6230768Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6232041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6233112Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6234194Z   File "/tmp/tmpll1ds1ip/c4/cc4bj6oy5lpgnnzsganqw2wyma3jehcszsa6t5lh5n4zrraq2lfh.py", line 62, in <module>
2025-12-04T10:35:19.6235508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6236434Z     kernel.precompile(
2025-12-04T10:35:19.6237447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6238637Z     self._precompile_worker()
2025-12-04T10:35:19.6239823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6241065Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6242141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6243145Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6243993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6244920Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6245993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6247086Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6247986Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6249149Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6250226Z ^
2025-12-04T10:35:19.6250991Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6251896Z 
2025-12-04T10:35:19.6252753Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6253786Z 
2025-12-04T10:35:19.6253792Z 
2025-12-04T10:35:19.6254173Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6255387Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6256219Z 
2025-12-04T10:35:19.6256712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6257427Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6258192Z frames [('total', 1)]
2025-12-04T10:35:19.6258764Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6259630Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6260483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6261135Z graph_break []
2025-12-04T10:35:19.6261545Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6262041Z frames [('total', 1)]
2025-12-04T10:35:19.6262469Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6263021Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6263617Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6264167Z graph_break []
2025-12-04T10:35:19.6264521Z =================================== FAILURES ===================================
2025-12-04T10:35:19.6265251Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6265944Z Traceback (most recent call last):
2025-12-04T10:35:19.6266952Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6268034Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6269200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6270362Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6271582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6272819Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6274018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6274997Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6276160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6277396Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6278729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6279707Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6280722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6281872Z     return self._compile_to_module()
2025-12-04T10:35:19.6282755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6283770Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6284860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6285957Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6286821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6288022Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6289271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6290373Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6291373Z   File "/tmp/tmplem41v0m/hn/chnmbhr5xpr4zjplom7zth6gayoaam2rivjf2al6d2mzyjwimps5.py", line 62, in <module>
2025-12-04T10:35:19.6292760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6293835Z     kernel.precompile(
2025-12-04T10:35:19.6294881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6296029Z     self._precompile_worker()
2025-12-04T10:35:19.6297176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6298317Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6299705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6300727Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6301884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6302962Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6304002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6305179Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6306230Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6307359Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6308609Z ^
2025-12-04T10:35:19.6309378Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6310457Z 
2025-12-04T10:35:19.6311319Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6312354Z 
2025-12-04T10:35:19.6312360Z 
2025-12-04T10:35:19.6312622Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6314188Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6315293Z 
2025-12-04T10:35:19.6315717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6316481Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6317311Z frames [('total', 1)]
2025-12-04T10:35:19.6317765Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6318476Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6319286Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6393533Z graph_break []
2025-12-04T10:35:19.6394022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6394606Z frames [('total', 1)]
2025-12-04T10:35:19.6394986Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6395512Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6399219Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6399644Z graph_break []
2025-12-04T10:35:19.6399992Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6400485Z frames [('total', 1)]
2025-12-04T10:35:19.6400747Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6401236Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6401949Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6402479Z graph_break []
2025-12-04T10:35:19.6403372Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml -
2025-12-04T10:35:19.6404436Z =========================== short test summary info ============================
2025-12-04T10:35:19.6406073Z FAILED [0.2612s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6407654Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6408747Z ^
2025-12-04T10:35:19.6409405Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6410063Z 
2025-12-04T10:35:19.6410825Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6411747Z 
2025-12-04T10:35:19.6411753Z 
2025-12-04T10:35:19.6411992Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6413249Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6414176Z 
2025-12-04T10:35:19.6414450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6415035Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.6415503Z ========================== 1 failed, 2 rerun in 2.25s ==========================
2025-12-04T10:35:19.6415899Z Got exit code 1
2025-12-04T10:35:19.6416114Z Retrying single test...
2025-12-04T10:35:19.6416994Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml
2025-12-04T10:35:19.6417723Z ============================= test session starts ==============================
2025-12-04T10:35:19.6418460Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.6419225Z cachedir: .pytest_cache
2025-12-04T10:35:19.6420018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.6420905Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.6421284Z configfile: pytest.ini
2025-12-04T10:35:19.6422030Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.6422855Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.6423824Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6424649Z Running 1 items in this shard
2025-12-04T10:35:19.6424832Z 
2025-12-04T10:35:19.6425900Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6427897Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6429192Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6430036Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6430957Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6431895Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6432852Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6434040Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6435202Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6436423Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6437859Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6439047Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6440230Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6441222Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6442125Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6443003Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6444034Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6445256Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6446437Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6447559Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6448584Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6449677Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6450807Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6451887Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6452826Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6453709Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6454687Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6455656Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6456625Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6457675Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6458848Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6460357Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6461380Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6463590Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6466307Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6467885Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6469419Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6470929Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6472382Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6473829Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6475378Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6476678Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6478134Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6479367Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6480557Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6481538Z ('RERUN', {'yellow': True}) [1.6944s] [100%]
2025-12-04T10:35:19.6482814Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6484784Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6486067Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6487045Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6487973Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6488919Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6489889Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6490927Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6491991Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6493091Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6494191Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6495156Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6496078Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6497126Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6498027Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6498899Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6500127Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6501228Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6502243Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6503246Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6504268Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6505388Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6506549Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6507628Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6508916Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6509889Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6510932Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6511986Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6513097Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6514151Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6515318Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6516632Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6517649Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6519861Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6522197Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6523762Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6525295Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6526709Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6528152Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6529594Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6531111Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6532403Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6533843Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6535074Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6536305Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6537289Z ('RERUN', {'yellow': True}) [0.2623s] [100%]
2025-12-04T10:35:19.6538649Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6540693Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6541985Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6542833Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6543755Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6544693Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6545680Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6546728Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6547799Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6548897Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6550075Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6551031Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6551950Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6552914Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6553818Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6554693Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6555728Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6556831Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6557845Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6558843Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6559867Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6560962Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6562093Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6563163Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6564096Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6565067Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6566043Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6567013Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6567980Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6569037Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6570208Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6571527Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6572539Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6574740Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6577242Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6578703Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6580335Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6581749Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6583193Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6584630Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6586194Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6587481Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6588932Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6590165Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6591438Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6592396Z FAILED [0.2619s] [100%]
2025-12-04T10:35:19.6592548Z 
2025-12-04T10:35:19.6592667Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.6593182Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6593670Z Traceback (most recent call last):
2025-12-04T10:35:19.6594250Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6595043Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6595826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6596573Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6597335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6598048Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6598746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6599506Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6600192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6601041Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6601923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6602648Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6603337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6603964Z     return self._compile_to_module()
2025-12-04T10:35:19.6604570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6605236Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6605934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6606596Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6607231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6608189Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6609010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6609727Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6610352Z   File "/tmp/tmppwu8nk_3/4m/c4mqblwv37do654itola2doyw47mjtysh4z2t4si662nnp7a4ado.py", line 62, in <module>
2025-12-04T10:35:19.6611284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6611889Z     kernel.precompile(
2025-12-04T10:35:19.6612522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6613218Z     self._precompile_worker()
2025-12-04T10:35:19.6613911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6614685Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6615606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6616414Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6617089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6617801Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6618514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6619342Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6619947Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6620689Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6621321Z ^
2025-12-04T10:35:19.6621827Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6622333Z 
2025-12-04T10:35:19.6622942Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6623680Z 
2025-12-04T10:35:19.6623685Z 
2025-12-04T10:35:19.6623867Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6624965Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6625762Z 
2025-12-04T10:35:19.6625990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6626516Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6626899Z frames [('total', 1)]
2025-12-04T10:35:19.6627146Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6627525Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6628019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6628466Z graph_break []
2025-12-04T10:35:19.6628907Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6629452Z Traceback (most recent call last):
2025-12-04T10:35:19.6630035Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6630737Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6631470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6632264Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6633033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6633757Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6634470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6635140Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6635836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6636686Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6637523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6638203Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6638959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6639589Z     return self._compile_to_module()
2025-12-04T10:35:19.6640194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6640869Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6641557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6642235Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6642866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6643602Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6644421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6645147Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6645805Z   File "/tmp/tmpwr23i0hc/hh/chhcfwf5wj6yh4jwa4injyyvcfmhtvghjfm5pjquinmgloh6xgzy.py", line 62, in <module>
2025-12-04T10:35:19.6646767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6647370Z     kernel.precompile(
2025-12-04T10:35:19.6648006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6648782Z     self._precompile_worker()
2025-12-04T10:35:19.6649477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6650262Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6651027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6651830Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6652500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6653215Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6653927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6654732Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6655339Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6656127Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6656765Z ^
2025-12-04T10:35:19.6657273Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6657787Z 
2025-12-04T10:35:19.6658418Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6659184Z 
2025-12-04T10:35:19.6659188Z 
2025-12-04T10:35:19.6659389Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6660379Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6661189Z 
2025-12-04T10:35:19.6661425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6661968Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6662365Z frames [('total', 1)]
2025-12-04T10:35:19.6662605Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6662999Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6663606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6663990Z graph_break []
2025-12-04T10:35:19.6664312Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6664711Z frames [('total', 1)]
2025-12-04T10:35:19.6664946Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6665329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6665908Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6666318Z graph_break []
2025-12-04T10:35:19.6666559Z =================================== FAILURES ===================================
2025-12-04T10:35:19.6667073Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6667575Z Traceback (most recent call last):
2025-12-04T10:35:19.6668164Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6668903Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6669669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6670429Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6671213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6672057Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6695170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6695922Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6696635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6697508Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6698357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6699119Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6699779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6700425Z     return self._compile_to_module()
2025-12-04T10:35:19.6701039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6701702Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6702390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6703062Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6703703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6704441Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6705258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6705986Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6706626Z   File "/tmp/tmpd80y832x/23/c23whko23vhdgy2z4vyb3poajgcgorw2xfm2hs6apsy2gu2ag65k.py", line 62, in <module>
2025-12-04T10:35:19.6707562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6708479Z     kernel.precompile(
2025-12-04T10:35:19.6709112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6709800Z     self._precompile_worker()
2025-12-04T10:35:19.6710657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6711435Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6712201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6713008Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6713681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6714386Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6715075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6715853Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6716459Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6717197Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6717815Z ^
2025-12-04T10:35:19.6718308Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6718939Z 
2025-12-04T10:35:19.6719551Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6720272Z 
2025-12-04T10:35:19.6720276Z 
2025-12-04T10:35:19.6720462Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6721426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6722226Z 
2025-12-04T10:35:19.6722452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6722977Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6723361Z frames [('total', 1)]
2025-12-04T10:35:19.6723598Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6723972Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6724477Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6724859Z graph_break []
2025-12-04T10:35:19.6725172Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6725603Z frames [('total', 1)]
2025-12-04T10:35:19.6725835Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6726200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6726701Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6727100Z graph_break []
2025-12-04T10:35:19.6727398Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6727773Z frames [('total', 1)]
2025-12-04T10:35:19.6728010Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6728365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6728854Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6729261Z graph_break []
2025-12-04T10:35:19.6729943Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml -
2025-12-04T10:35:19.6730753Z =========================== short test summary info ============================
2025-12-04T10:35:19.6731675Z FAILED [0.2619s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6732961Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6733589Z ^
2025-12-04T10:35:19.6734076Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6734587Z 
2025-12-04T10:35:19.6735190Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6735966Z 
2025-12-04T10:35:19.6735977Z 
2025-12-04T10:35:19.6736161Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6737135Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6737927Z 
2025-12-04T10:35:19.6738166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6738657Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.6739156Z ================== 1 failed, 187 deselected, 2 rerun in 2.25s ==================
2025-12-04T10:35:19.6739528Z Got exit code 1
2025-12-04T10:35:19.6739739Z Retrying single test...
2025-12-04T10:35:19.6740294Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml
2025-12-04T10:35:19.6741037Z ============================= test session starts ==============================
2025-12-04T10:35:19.6741582Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.6742084Z cachedir: .pytest_cache
2025-12-04T10:35:19.6742677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.6743346Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.6743635Z configfile: pytest.ini
2025-12-04T10:35:19.6744248Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.6745003Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.6745902Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6746718Z Running 1 items in this shard
2025-12-04T10:35:19.6746904Z 
2025-12-04T10:35:19.6747966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6749949Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6751231Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6752072Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6753002Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6753942Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6754892Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6756042Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6757103Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6758198Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6759277Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6760226Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6761147Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6762102Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6763005Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6763871Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6764907Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6766141Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6767151Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6768145Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6769171Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6770260Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6771386Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6772469Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6773401Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6774279Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6775252Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6776269Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6777233Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6778291Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6779501Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6780919Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6781924Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6784125Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6786509Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6787965Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6789491Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6790888Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6792415Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6793852Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6795392Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6796699Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6798143Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6799371Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6800548Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6801529Z ('RERUN', {'yellow': True}) [1.7020s] [100%]
2025-12-04T10:35:19.6802794Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6804770Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6806044Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6806883Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6808143Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6809095Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6810045Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6811067Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6812127Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6813224Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6814307Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6815256Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6816227Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6817299Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6818200Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6819116Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6820156Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6821252Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6822269Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6823262Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6824289Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6825378Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6826505Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6827575Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6828504Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6829384Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6830358Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6831321Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6832276Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6833408Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6834570Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6835932Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6836941Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6839148Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6841472Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6843004Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6844537Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6845992Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6847425Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6848862Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6850382Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6851672Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6853112Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6854331Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6855512Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6856495Z ('RERUN', {'yellow': True}) [0.2624s] [100%]
2025-12-04T10:35:19.6857758Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.6859848Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6861124Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.6861971Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.6862887Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.6863820Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.6864767Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.6865839Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.6866899Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.6867990Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.6869181Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.6870134Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.6871060Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.6872013Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.6873230Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.6874209Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.6875406Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.6876644Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.6877755Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.6878914Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.6880052Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.6881217Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.6882481Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.6883672Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.6884723Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.6885827Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.6886883Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.6887949Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.6889064Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.6890239Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.6891460Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.6892914Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.6894039Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.6896367Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.6898971Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.6900587Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6902270Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6903785Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6905319Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6906965Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6908877Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6910252Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.6911834Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6913174Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.6914615Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6915726Z FAILED [0.2621s] [100%]
2025-12-04T10:35:19.6915910Z 
2025-12-04T10:35:19.6916099Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.6916696Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6917338Z Traceback (most recent call last):
2025-12-04T10:35:19.6918007Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6918797Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6919690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6920550Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6921406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6922246Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6923066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6923818Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6924823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6925771Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6926685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6927558Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6928295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6928961Z     return self._compile_to_module()
2025-12-04T10:35:19.6929758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6930512Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6931324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6932091Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6932818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6933689Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6934628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6935423Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6936269Z   File "/tmp/tmpgjvjt70m/4k/c4kmuoqlw7haje36g2hzye4whuln2vkn37jxrokygmpdgjaschfj.py", line 62, in <module>
2025-12-04T10:35:19.6937322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6937975Z     kernel.precompile(
2025-12-04T10:35:19.6938773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6939637Z     self._precompile_worker()
2025-12-04T10:35:19.6940411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6941309Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6942270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6943188Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6943993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6944760Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6945632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6946561Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6947266Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6948067Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6948829Z ^
2025-12-04T10:35:19.6949446Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6949985Z 
2025-12-04T10:35:19.6950639Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6951438Z 
2025-12-04T10:35:19.6951442Z 
2025-12-04T10:35:19.6951672Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6952845Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6953703Z 
2025-12-04T10:35:19.6953948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6954653Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6955095Z frames [('total', 1)]
2025-12-04T10:35:19.6955413Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6955976Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6956573Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6956994Z graph_break []
2025-12-04T10:35:19.6957571Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.6958153Z Traceback (most recent call last):
2025-12-04T10:35:19.6958773Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.6959664Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.6960491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.6961383Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.6962212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.6963021Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.6963893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.6964672Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.6965418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.6966468Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.6967409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.6968205Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.6968939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.6969756Z     return self._compile_to_module()
2025-12-04T10:35:19.6970489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.6971302Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.6972038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.6972829Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.6973628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.6974473Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.6975343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.6976251Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.6977009Z   File "/tmp/tmpfasun0k5/5e/c5eag5bwyt6iyfm2wou25d5fxqzs53tabd65xn2grbj46tetm5rr.py", line 62, in <module>
2025-12-04T10:35:19.6978009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.6978756Z     kernel.precompile(
2025-12-04T10:35:19.6979588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.6980442Z     self._precompile_worker()
2025-12-04T10:35:19.6981284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.6982150Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.6982989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.6983956Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.6984675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.6985506Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.6986373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.6987238Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.6987878Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.6988796Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.6989506Z ^
2025-12-04T10:35:19.6990147Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.6990698Z 
2025-12-04T10:35:19.6991338Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.6992120Z 
2025-12-04T10:35:19.6992124Z 
2025-12-04T10:35:19.6992339Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.6993463Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.6994305Z 
2025-12-04T10:35:19.6994594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.6995171Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6995714Z frames [('total', 1)]
2025-12-04T10:35:19.6996061Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6996548Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.6997228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.6997725Z graph_break []
2025-12-04T10:35:19.6998158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.6998641Z frames [('total', 1)]
2025-12-04T10:35:19.6998989Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.6999465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7000123Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7000593Z graph_break []
2025-12-04T10:35:19.7000929Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7001621Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _
2025-12-04T10:35:19.7002190Z Traceback (most recent call last):
2025-12-04T10:35:19.7002858Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7003708Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7004569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7005360Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7006267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7008445Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7009356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7010204Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7010984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7011913Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7012938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7013679Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7014405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7015218Z     return self._compile_to_module()
2025-12-04T10:35:19.7015922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7016651Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7017501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7018264Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7019127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7019938Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7020837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7021734Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7022437Z   File "/tmp/tmps4_nfeqs/7v/c7vd5dgbf5mkzsxjeurorup7kyyotjy6sbjzorolm46evt2byqvh.py", line 62, in <module>
2025-12-04T10:35:19.7023480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7024221Z     kernel.precompile(
2025-12-04T10:35:19.7024951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7025713Z     self._precompile_worker()
2025-12-04T10:35:19.7026665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7027578Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7028435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7029325Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7030124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7030921Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7031771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7032628Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7033320Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7034211Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7034942Z ^
2025-12-04T10:35:19.7035517Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7036158Z 
2025-12-04T10:35:19.7036958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7037743Z 
2025-12-04T10:35:19.7037748Z 
2025-12-04T10:35:19.7037964Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7038995Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.7047346Z 
2025-12-04T10:35:19.7047611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7048151Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7048534Z frames [('total', 1)]
2025-12-04T10:35:19.7048779Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7049157Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7049669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7050045Z graph_break []
2025-12-04T10:35:19.7050346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7050728Z frames [('total', 1)]
2025-12-04T10:35:19.7050960Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7051320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7051819Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7052208Z graph_break []
2025-12-04T10:35:19.7052516Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7052900Z frames [('total', 1)]
2025-12-04T10:35:19.7053132Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7053490Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7053985Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7054383Z graph_break []
2025-12-04T10:35:19.7055064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml -
2025-12-04T10:35:19.7055874Z =========================== short test summary info ============================
2025-12-04T10:35:19.7056805Z FAILED [0.2621s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7058113Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7058730Z ^
2025-12-04T10:35:19.7059291Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7059798Z 
2025-12-04T10:35:19.7060413Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7061138Z 
2025-12-04T10:35:19.7061142Z 
2025-12-04T10:35:19.7061332Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7062301Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.7063101Z 
2025-12-04T10:35:19.7063333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7063833Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7064263Z ================== 1 failed, 187 deselected, 2 rerun in 2.26s ==================
2025-12-04T10:35:19.7064633Z Got exit code 1
2025-12-04T10:35:19.7065235Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.7066256Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.7067119Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml
2025-12-04T10:35:19.7067765Z ============================= test session starts ==============================
2025-12-04T10:35:19.7068317Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7068814Z cachedir: .pytest_cache
2025-12-04T10:35:19.7069415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7070091Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7070380Z configfile: pytest.ini
2025-12-04T10:35:19.7070991Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7071747Z collecting ... collected 188 items / 1 deselected / 187 selected
2025-12-04T10:35:19.7072167Z stepcurrent: skipping 1 already run items.
2025-12-04T10:35:19.7072483Z Running 187 items in this shard
2025-12-04T10:35:19.7072658Z 
2025-12-04T10:35:19.7073731Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7075708Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7076992Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7077842Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7078280Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7078670Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7079118Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7079694Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7080185Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7080674Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7081155Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7081522Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7081960Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7082359Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7082743Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7083118Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7083663Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7084180Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7084632Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7085055Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7085549Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7086028Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7086555Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7086986Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7087377Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7087755Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7088232Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7088605Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7089083Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7089537Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7090138Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7090812Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7091116Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7092901Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7093363Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7094254Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7094792Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7095624Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7096198Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7096945Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7097597Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7098121Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7098933Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7099287Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7100050Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7100163Z ('RERUN', {'yellow': True}) [1.7114s] [  0%]
2025-12-04T10:35:19.7101212Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7102016Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7102374Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7102821Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7103269Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7103654Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7104101Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7104564Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7105052Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7105552Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7106017Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7106387Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7106825Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7107327Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7108249Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7108727Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7109287Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7109727Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7110183Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7110611Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7111097Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7111587Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7112113Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7112537Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7112937Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7113309Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7113800Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7114164Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7114787Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7115240Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7115829Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7116428Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7116727Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7118523Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7118974Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7119973Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7120505Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7121267Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7121849Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7122599Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7123254Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7123776Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7124594Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7124894Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7125663Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7125780Z ('RERUN', {'yellow': True}) [0.2653s] [  0%]
2025-12-04T10:35:19.7126921Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7127740Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7128097Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7128482Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7128915Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7129298Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7129761Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7130212Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7130709Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7131196Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7131744Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7132116Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7132554Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7132952Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7133337Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7133712Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7134266Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7134706Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7135171Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7135641Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7136133Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7136617Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7137144Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7137574Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7137963Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7138419Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7138904Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7139324Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7139817Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7140263Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7140861Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7141459Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7141755Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7143536Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7144074Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7144962Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7145522Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7146312Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7146885Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7147646Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7148298Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7148825Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7149632Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7149932Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7150798Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7150882Z FAILED [0.2650s] [  0%]
2025-12-04T10:35:19.7150887Z 
2025-12-04T10:35:19.7151009Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7151278Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7151385Z Traceback (most recent call last):
2025-12-04T10:35:19.7151778Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7151987Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7152413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7152625Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7153066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7153231Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7153662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7153861Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7154323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7154592Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7155035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7155161Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7155571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7155676Z     return self._compile_to_module()
2025-12-04T10:35:19.7156086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7156227Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7156667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7156770Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7157195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7157387Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7157884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7157993Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7158424Z   File "/tmp/tmph4pfb5gr/di/cdid7yg2wwwtxydkq7m5a4t26b4ojjxldgdwhzxvmxuaety5klet.py", line 62, in <module>
2025-12-04T10:35:19.7158831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7158921Z     kernel.precompile(
2025-12-04T10:35:19.7159394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7159492Z     self._precompile_worker()
2025-12-04T10:35:19.7159998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7160147Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7160728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7160898Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7161281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7161487Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7161865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7162155Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7162347Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7162787Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7162857Z ^
2025-12-04T10:35:19.7163251Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7163256Z 
2025-12-04T10:35:19.7163864Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7163869Z 
2025-12-04T10:35:19.7163873Z 
2025-12-04T10:35:19.7164131Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7164817Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7164822Z 
2025-12-04T10:35:19.7165043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7165232Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7165320Z frames [('total', 1)]
2025-12-04T10:35:19.7165426Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7165628Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7165810Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7165893Z graph_break []
2025-12-04T10:35:19.7166166Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7166267Z Traceback (most recent call last):
2025-12-04T10:35:19.7166660Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7166860Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7167275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7167491Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7167932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7168089Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7168523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7168646Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7169104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7169381Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7169821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7169948Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7170435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7170539Z     return self._compile_to_module()
2025-12-04T10:35:19.7170949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7171082Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7171525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7171635Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7172054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7172257Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7172752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7172864Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7173293Z   File "/tmp/tmp3grrqx61/vv/cvv2v522hnbk3edgdy4pt67uldhm62ysumjfcxgcxspjkoj5fazb.py", line 62, in <module>
2025-12-04T10:35:19.7173685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7173787Z     kernel.precompile(
2025-12-04T10:35:19.7174258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7174434Z     self._precompile_worker()
2025-12-04T10:35:19.7174939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7175084Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7175640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7175809Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7176188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7176401Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7176770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7177061Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7177250Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7177683Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7177757Z ^
2025-12-04T10:35:19.7178144Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7178156Z 
2025-12-04T10:35:19.7178768Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7178773Z 
2025-12-04T10:35:19.7178777Z 
2025-12-04T10:35:19.7178956Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7179686Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7179704Z 
2025-12-04T10:35:19.7179928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7180105Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7180193Z frames [('total', 1)]
2025-12-04T10:35:19.7180286Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7180576Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7180767Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7180846Z graph_break []
2025-12-04T10:35:19.7181029Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7181116Z frames [('total', 1)]
2025-12-04T10:35:19.7181209Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7181394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7181592Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7181669Z graph_break []
2025-12-04T10:35:19.7181787Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7182053Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7182151Z Traceback (most recent call last):
2025-12-04T10:35:19.7182537Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7182739Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7183152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7183357Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7183791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7184035Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7184465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7184591Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7185048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7185325Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7185822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7185940Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7186342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7186453Z     return self._compile_to_module()
2025-12-04T10:35:19.7186861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7186999Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7187434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7187539Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7187963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7188153Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7188657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7188780Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7189219Z   File "/tmp/tmp5gmwdyh3/xg/cxg7cacdaqrctjmjcdvbfa4kodev4x2fsazm3rlfps7jo3hvjass.py", line 62, in <module>
2025-12-04T10:35:19.7189612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7189708Z     kernel.precompile(
2025-12-04T10:35:19.7190180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7190384Z     self._precompile_worker()
2025-12-04T10:35:19.7190897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7191045Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7191556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7191728Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7192104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7192312Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7192686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7192984Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7193175Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7193609Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7193685Z ^
2025-12-04T10:35:19.7194071Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7194156Z 
2025-12-04T10:35:19.7194765Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7194769Z 
2025-12-04T10:35:19.7194773Z 
2025-12-04T10:35:19.7194957Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7195689Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7195699Z 
2025-12-04T10:35:19.7195927Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7196102Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7196193Z frames [('total', 1)]
2025-12-04T10:35:19.7196284Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7196481Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7196675Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7196751Z graph_break []
2025-12-04T10:35:19.7196935Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7197019Z frames [('total', 1)]
2025-12-04T10:35:19.7197111Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7197298Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7197495Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7197572Z graph_break []
2025-12-04T10:35:19.7197748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7197829Z frames [('total', 1)]
2025-12-04T10:35:19.7197921Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7198111Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7198305Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7198392Z graph_break []
2025-12-04T10:35:19.7198950Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml -
2025-12-04T10:35:19.7199095Z =========================== short test summary info ============================
2025-12-04T10:35:19.7199940Z FAILED [0.2650s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7200456Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7200528Z ^
2025-12-04T10:35:19.7200915Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7200920Z 
2025-12-04T10:35:19.7201524Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7201534Z 
2025-12-04T10:35:19.7201538Z 
2025-12-04T10:35:19.7201723Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7202398Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7202402Z 
2025-12-04T10:35:19.7202637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7202786Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7202948Z =================== 1 failed, 1 deselected, 2 rerun in 2.28s ===================
2025-12-04T10:35:19.7203028Z Got exit code 1
2025-12-04T10:35:19.7203118Z Retrying single test...
2025-12-04T10:35:19.7203523Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml
2025-12-04T10:35:19.7203825Z ============================= test session starts ==============================
2025-12-04T10:35:19.7204114Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7204208Z cachedir: .pytest_cache
2025-12-04T10:35:19.7204652Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7204760Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7204851Z configfile: pytest.ini
2025-12-04T10:35:19.7205320Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7205541Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.7206158Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7206255Z Running 1 items in this shard
2025-12-04T10:35:19.7206260Z 
2025-12-04T10:35:19.7207330Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7208366Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7208734Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7209106Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7209557Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7209942Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7210396Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7210979Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7211475Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7211971Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7212444Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7212819Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7213262Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7213665Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7214050Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7214430Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7214972Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7215574Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7216037Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7216458Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7216951Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7217430Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7217966Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7218399Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7218792Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7219205Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7219691Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7220063Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7220549Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7221008Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7221599Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7222282Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7222583Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7224367Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7224834Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7225773Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7226310Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7227169Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7227747Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7228501Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7229172Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7229686Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7230504Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7230814Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7231577Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7231693Z ('RERUN', {'yellow': True}) [1.7086s] [100%]
2025-12-04T10:35:19.7232748Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7233561Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7233918Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7234290Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7234813Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7235198Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7235652Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7236108Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7236596Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7237092Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7237571Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7237948Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7238384Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7238863Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7239244Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7239616Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7240171Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7240611Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7241075Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7241497Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7241987Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7242474Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7243007Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7243434Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7243826Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7244195Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7244686Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7245052Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7245617Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7246066Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7246658Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7247257Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7247559Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7249349Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7249799Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7250773Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7251303Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7252068Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7252643Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7253396Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7254056Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7254572Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7255394Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7255700Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7256466Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7256581Z ('RERUN', {'yellow': True}) [0.2657s] [100%]
2025-12-04T10:35:19.7257640Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7258542Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7258927Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7259390Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7259855Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7260275Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7260760Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7261255Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7261784Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7262311Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7262867Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7263232Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7263674Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7264084Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7264466Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7264841Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7265400Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7265880Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7266346Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7266769Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7267260Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7267742Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7268277Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7268701Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7269090Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7269556Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7270043Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7270412Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7270895Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7271348Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7271950Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7272549Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7272855Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7274632Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7275204Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7276085Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7276625Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7277384Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7277960Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7278723Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7279377Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7279901Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7280711Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7281025Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7281862Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7281945Z FAILED [0.2652s] [100%]
2025-12-04T10:35:19.7281950Z 
2025-12-04T10:35:19.7282071Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7282340Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7282452Z Traceback (most recent call last):
2025-12-04T10:35:19.7282838Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7283038Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7283456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7283665Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7284108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7284272Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7284701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7284912Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7285362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7285655Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7286126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7286250Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7286667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7286770Z     return self._compile_to_module()
2025-12-04T10:35:19.7287178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7287320Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7287761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7287865Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7288287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7288478Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7288983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7289085Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7289512Z   File "/tmp/tmpv46b3rzk/rs/crsizjblpp47j77ikke7sn2zycwm7pk7pz3ig2sccvrsf6mc25l3.py", line 62, in <module>
2025-12-04T10:35:19.7289906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7289993Z     kernel.precompile(
2025-12-04T10:35:19.7290473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7290565Z     self._precompile_worker()
2025-12-04T10:35:19.7291069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7291218Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7291802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7291971Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7292349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7292550Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7292926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7293211Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7293403Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7293842Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7293909Z ^
2025-12-04T10:35:19.7294306Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7294311Z 
2025-12-04T10:35:19.7294914Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7294919Z 
2025-12-04T10:35:19.7294923Z 
2025-12-04T10:35:19.7295106Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7295923Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7295928Z 
2025-12-04T10:35:19.7296149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7296336Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7296417Z frames [('total', 1)]
2025-12-04T10:35:19.7296511Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7296717Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7296900Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7296984Z graph_break []
2025-12-04T10:35:19.7297252Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7297350Z Traceback (most recent call last):
2025-12-04T10:35:19.7297746Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7297945Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7298356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7298565Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7299002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7299211Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7299647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7299768Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7300222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7300497Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7300939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7301058Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7301460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7301644Z     return self._compile_to_module()
2025-12-04T10:35:19.7302054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7302190Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7302629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7302739Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7303161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7303353Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7303849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7303959Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7304392Z   File "/tmp/tmp3rljc4wb/a6/ca6mehpo2smnii23oqqdmk3z7tb3ehs5wa5gwwusvheoygzbfdlu.py", line 62, in <module>
2025-12-04T10:35:19.7304785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7304871Z     kernel.precompile(
2025-12-04T10:35:19.7305339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7305565Z     self._precompile_worker()
2025-12-04T10:35:19.7306074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7306223Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7306729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7306899Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7307288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7307490Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7308044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7308335Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7308525Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7308962Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7309031Z ^
2025-12-04T10:35:19.7309419Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7309423Z 
2025-12-04T10:35:19.7310042Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7310047Z 
2025-12-04T10:35:19.7310051Z 
2025-12-04T10:35:19.7310233Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7310919Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7310928Z 
2025-12-04T10:35:19.7311157Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7311344Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7311432Z frames [('total', 1)]
2025-12-04T10:35:19.7311523Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7311729Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7312062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7312145Z graph_break []
2025-12-04T10:35:19.7312325Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7312407Z frames [('total', 1)]
2025-12-04T10:35:19.7312497Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7312680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7312883Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7312962Z graph_break []
2025-12-04T10:35:19.7313078Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7313353Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7313463Z Traceback (most recent call last):
2025-12-04T10:35:19.7313847Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7314051Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7314466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7314670Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7315111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7315406Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7315871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7315998Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7316455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7316735Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7317173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7317291Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7317697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7317798Z     return self._compile_to_module()
2025-12-04T10:35:19.7318206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7318346Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7318782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7318890Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7319310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7319501Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7320000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7320102Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7320526Z   File "/tmp/tmp2n_j7lvu/iw/ciwty7kgg2xlox3iafaem2ishfgcqak44inlcwyvbbmi63ff2ard.py", line 62, in <module>
2025-12-04T10:35:19.7320914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7321008Z     kernel.precompile(
2025-12-04T10:35:19.7321480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7321573Z     self._precompile_worker()
2025-12-04T10:35:19.7322158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7322311Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7322813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7322991Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7323372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7323572Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7323956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7324238Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7324441Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7324872Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7324940Z ^
2025-12-04T10:35:19.7325335Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7325421Z 
2025-12-04T10:35:19.7326026Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7326031Z 
2025-12-04T10:35:19.7326035Z 
2025-12-04T10:35:19.7326215Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7326893Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7326902Z 
2025-12-04T10:35:19.7327124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7327307Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7327390Z frames [('total', 1)]
2025-12-04T10:35:19.7327491Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7327687Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7327875Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7327957Z graph_break []
2025-12-04T10:35:19.7328341Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7341080Z frames [('total', 1)]
2025-12-04T10:35:19.7341214Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7341472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7341707Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7341807Z graph_break []
2025-12-04T10:35:19.7342006Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7342098Z frames [('total', 1)]
2025-12-04T10:35:19.7342198Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7342391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7342590Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7342680Z graph_break []
2025-12-04T10:35:19.7343257Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml -
2025-12-04T10:35:19.7343407Z =========================== short test summary info ============================
2025-12-04T10:35:19.7344079Z FAILED [0.2652s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7344643Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7344752Z ^
2025-12-04T10:35:19.7345242Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7345249Z 
2025-12-04T10:35:19.7345983Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7345993Z 
2025-12-04T10:35:19.7345997Z 
2025-12-04T10:35:19.7346190Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7346877Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7346882Z 
2025-12-04T10:35:19.7347122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7347280Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7347481Z ================== 1 failed, 187 deselected, 2 rerun in 2.27s ==================
2025-12-04T10:35:19.7347570Z Got exit code 1
2025-12-04T10:35:19.7347661Z Retrying single test...
2025-12-04T10:35:19.7348067Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml
2025-12-04T10:35:19.7348303Z ============================= test session starts ==============================
2025-12-04T10:35:19.7348600Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7348698Z cachedir: .pytest_cache
2025-12-04T10:35:19.7349148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7349255Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7349359Z configfile: pytest.ini
2025-12-04T10:35:19.7349826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7350018Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.7350637Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7350745Z Running 1 items in this shard
2025-12-04T10:35:19.7350749Z 
2025-12-04T10:35:19.7351822Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7352646Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7353017Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7353394Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7353840Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7354238Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7354692Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7355259Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7355798Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7356293Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7356779Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7357153Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7357600Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7358004Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7358393Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7358776Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7359326Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7359857Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7360319Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7360779Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7361316Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7361804Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7362369Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7362835Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7363239Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7363616Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7364106Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7364485Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7364972Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7365530Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7366301Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7370642Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7370969Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7372757Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7373229Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7374181Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7374823Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7375725Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7376721Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7377602Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7378253Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7378776Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7379737Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7380045Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7380809Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7380919Z ('RERUN', {'yellow': True}) [1.7074s] [100%]
2025-12-04T10:35:19.7381980Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7382824Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7383190Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7383563Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7384102Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7384490Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7384944Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7385453Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7385942Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7386433Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7386903Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7387272Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7387708Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7388222Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7388607Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7388975Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7410283Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7410753Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7411207Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7411630Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7412125Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7412607Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7413141Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7413571Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7413967Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7414336Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7414827Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7415203Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7415730Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7416415Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7417010Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7417612Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7417925Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7419814Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7420299Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7421379Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7421954Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7422776Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7423397Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7424201Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7424912Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7425520Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7426408Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7426737Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7427557Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7427679Z ('RERUN', {'yellow': True}) [0.2663s] [100%]
2025-12-04T10:35:19.7428828Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7429787Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7430147Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7430533Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.7430979Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.7431363Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7431823Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7432279Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7432780Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7433270Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7433824Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7434195Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.7434637Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7435043Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.7435455Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.7435850Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.7436399Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.7436843Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7437298Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.7437731Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7438224Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7438706Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.7439252Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.7439677Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7440073Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.7440527Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.7441008Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.7441384Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.7441863Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.7442320Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.7442916Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask)
2025-12-04T10:35:19.7443518Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None)
2025-12-04T10:35:19.7443818Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7445649Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7446197Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7447090Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7447628Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7448385Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7448966Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7449714Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7450373Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7450892Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7451707Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7452016Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7452887Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7452979Z FAILED [0.2653s] [100%]
2025-12-04T10:35:19.7452985Z 
2025-12-04T10:35:19.7453112Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7453395Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7453503Z Traceback (most recent call last):
2025-12-04T10:35:19.7453887Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7454105Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7454526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7454742Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7455199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7455360Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7455803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7455924Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7456461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7456746Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7457192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7457329Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7457746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7457847Z     return self._compile_to_module()
2025-12-04T10:35:19.7458264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7458400Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7458837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7458959Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7459465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7459666Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7460165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7460274Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7460715Z   File "/tmp/tmpb9w0s2xl/xf/cxfgzk5y7ii4s24flmdrloryw2k5hvtbdpigtzky3asn5fgwefle.py", line 62, in <module>
2025-12-04T10:35:19.7461107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7461203Z     kernel.precompile(
2025-12-04T10:35:19.7461683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7461779Z     self._precompile_worker()
2025-12-04T10:35:19.7462289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7462441Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7463029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7463204Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7463582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7463795Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7464167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7464456Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7464655Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7465095Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7465177Z ^
2025-12-04T10:35:19.7465628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7465633Z 
2025-12-04T10:35:19.7466236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7466243Z 
2025-12-04T10:35:19.7466247Z 
2025-12-04T10:35:19.7466432Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7467201Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7467206Z 
2025-12-04T10:35:19.7467436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7467617Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7467709Z frames [('total', 1)]
2025-12-04T10:35:19.7467811Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7468017Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7468207Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7468290Z graph_break []
2025-12-04T10:35:19.7468560Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7468674Z Traceback (most recent call last):
2025-12-04T10:35:19.7469054Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7469264Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7469688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7469894Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7470347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7470509Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7470940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7471067Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7471519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7471797Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7472239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7472360Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7472769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7473012Z     return self._compile_to_module()
2025-12-04T10:35:19.7473423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7473567Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7474007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7474125Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7474544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7474736Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7475237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7475340Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7475753Z   File "/tmp/tmpip_kb2yj/tg/ctgwtyu5m2wrux5ehu73k2o5wof472qed5rnrw7nbx2o5ar533mj.py", line 62, in <module>
2025-12-04T10:35:19.7476143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7476232Z     kernel.precompile(
2025-12-04T10:35:19.7476715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7476895Z     self._precompile_worker()
2025-12-04T10:35:19.7477404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7477566Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7478074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7478254Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7478644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7478854Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7479235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7479523Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7479731Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7480167Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7480240Z ^
2025-12-04T10:35:19.7480641Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7480645Z 
2025-12-04T10:35:19.7481257Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7481262Z 
2025-12-04T10:35:19.7481266Z 
2025-12-04T10:35:19.7481454Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7482137Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7482146Z 
2025-12-04T10:35:19.7482372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7482564Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7482652Z frames [('total', 1)]
2025-12-04T10:35:19.7482754Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7482958Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7483228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7483317Z graph_break []
2025-12-04T10:35:19.7483497Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7483584Z frames [('total', 1)]
2025-12-04T10:35:19.7483690Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7483875Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7484076Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7484164Z graph_break []
2025-12-04T10:35:19.7484288Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7484572Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _
2025-12-04T10:35:19.7484676Z Traceback (most recent call last):
2025-12-04T10:35:19.7485066Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7485288Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7485750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7485976Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7486418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7486665Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7487117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7487245Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7487698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7487988Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7488434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7488573Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7488983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7489096Z     return self._compile_to_module()
2025-12-04T10:35:19.7489521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7489662Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7490106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7490214Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7490640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7490842Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7491340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7491445Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7491983Z   File "/tmp/tmptldim1qi/qg/cqgvvref2l6hoiciiaz32zvx44lbwazoxstoh4jwszxsk55wyxef.py", line 62, in <module>
2025-12-04T10:35:19.7492380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7492482Z     kernel.precompile(
2025-12-04T10:35:19.7492958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7493055Z     self._precompile_worker()
2025-12-04T10:35:19.7493690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7493846Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7494360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7494527Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7494926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7495148Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7495536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7495867Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7496081Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7496520Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7496605Z ^
2025-12-04T10:35:19.7496998Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7497002Z 
2025-12-04T10:35:19.7497698Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7497721Z 
2025-12-04T10:35:19.7497725Z 
2025-12-04T10:35:19.7497906Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7498587Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7498592Z 
2025-12-04T10:35:19.7498827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7499008Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7499148Z frames [('total', 1)]
2025-12-04T10:35:19.7499242Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7499440Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7499639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7499718Z graph_break []
2025-12-04T10:35:19.7499899Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7499993Z frames [('total', 1)]
2025-12-04T10:35:19.7500087Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7500268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7500470Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7500554Z graph_break []
2025-12-04T10:35:19.7500748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7500832Z frames [('total', 1)]
2025-12-04T10:35:19.7500927Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7501116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7501308Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7501391Z graph_break []
2025-12-04T10:35:19.7501963Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml -
2025-12-04T10:35:19.7502106Z =========================== short test summary info ============================
2025-12-04T10:35:19.7502774Z FAILED [0.2653s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7503298Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7503370Z ^
2025-12-04T10:35:19.7503767Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7503772Z 
2025-12-04T10:35:19.7504376Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7504385Z 
2025-12-04T10:35:19.7504389Z 
2025-12-04T10:35:19.7504575Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7505261Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7505266Z 
2025-12-04T10:35:19.7505537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7505706Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7505880Z ================== 1 failed, 187 deselected, 2 rerun in 2.27s ==================
2025-12-04T10:35:19.7505973Z Got exit code 1
2025-12-04T10:35:19.7506444Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.7506878Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.7507290Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml
2025-12-04T10:35:19.7507426Z ============================= test session starts ==============================
2025-12-04T10:35:19.7507950Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7508046Z cachedir: .pytest_cache
2025-12-04T10:35:19.7508500Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7508609Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7508699Z configfile: pytest.ini
2025-12-04T10:35:19.7509162Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7509367Z collecting ... collected 188 items / 2 deselected / 186 selected
2025-12-04T10:35:19.7509485Z stepcurrent: skipping 2 already run items.
2025-12-04T10:35:19.7509587Z Running 186 items in this shard
2025-12-04T10:35:19.7509592Z 
2025-12-04T10:35:19.7510589Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7511274Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7511659Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7512120Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7512606Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7513083Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7513457Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7514082Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7514525Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7514998Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7515427Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7515823Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7516192Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7516677Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7517052Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7517527Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7517975Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7518540Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7518841Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7520489Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7520955Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7521855Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7522385Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7523147Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7523723Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7524484Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7525135Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7525728Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7526419Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7526720Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7527496Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7527606Z ('RERUN', {'yellow': True}) [2.0495s] [  0%]
2025-12-04T10:35:19.7528597Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7529270Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7529648Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7530122Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7530703Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7531196Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7531567Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7532072Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7532534Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7532996Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7533449Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7533854Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7534229Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7534719Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7535084Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7535616Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7536062Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7536527Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7536830Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7538548Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7539077Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7539973Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7540548Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7541313Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7541910Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7542747Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7543405Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7543945Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7544630Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7544950Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7545727Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7545855Z ('RERUN', {'yellow': True}) [0.4441s] [  0%]
2025-12-04T10:35:19.7546855Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7547531Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7547923Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7548393Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7548879Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7549355Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7549806Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7550323Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7550772Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7551261Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7551697Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7552105Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7552480Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7552962Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7553342Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7553816Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7554362Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7554824Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7555137Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7556844Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7557313Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7558221Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7558767Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7559532Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7560109Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7560888Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7561555Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7562158Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7562852Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7563164Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7563949Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7564035Z FAILED [0.4405s] [  0%]
2025-12-04T10:35:19.7564040Z 
2025-12-04T10:35:19.7564169Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7564465Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7564571Z Traceback (most recent call last):
2025-12-04T10:35:19.7564977Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7565191Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7565610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7565923Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7566363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7566541Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7566985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7567117Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7567588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7567869Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7568316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7568458Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7568866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7568982Z     return self._compile_to_module()
2025-12-04T10:35:19.7569393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7569531Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7569998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7570110Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7570553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7570757Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7571263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7571388Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7571838Z   File "/tmp/tmp4dsrgto4/e7/ce7frc7nur2mwskyxcvnk6xuunrzu6zbr44yj7npyhf66f6bjjgq.py", line 163, in <module>
2025-12-04T10:35:19.7572240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7572345Z     kernel.precompile(
2025-12-04T10:35:19.7572900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7573014Z     self._precompile_worker()
2025-12-04T10:35:19.7573532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7573689Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7574225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7574396Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7574796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7575012Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7575397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7575749Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7575946Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7576263Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7576497Z ^
2025-12-04T10:35:19.7576896Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7576901Z 
2025-12-04T10:35:19.7577534Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7577539Z 
2025-12-04T10:35:19.7577543Z 
2025-12-04T10:35:19.7577734Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7578449Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7578454Z 
2025-12-04T10:35:19.7578681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7578864Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7578967Z frames [('total', 1)]
2025-12-04T10:35:19.7579113Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7579320Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7579521Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7579604Z graph_break []
2025-12-04T10:35:19.7579905Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7580010Z Traceback (most recent call last):
2025-12-04T10:35:19.7580400Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7580614Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7581032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7581263Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7581707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7581876Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7582335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7582463Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7583001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7583293Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7583747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7583887Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7584313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7584419Z     return self._compile_to_module()
2025-12-04T10:35:19.7584838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7584973Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7585439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7585572Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7586011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7586222Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7586722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7586919Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7587355Z   File "/tmp/tmpojdd_ofp/r2/cr2yryra4s7c3n442xzvtykshgmgrlfa3nxm7rbhyhjqkt56eqyd.py", line 163, in <module>
2025-12-04T10:35:19.7587760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7587863Z     kernel.precompile(
2025-12-04T10:35:19.7588339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7588436Z     self._precompile_worker()
2025-12-04T10:35:19.7588951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7589103Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7589622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7589804Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7590195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7590420Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7590806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7591106Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7591317Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7591623Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7591719Z ^
2025-12-04T10:35:19.7592119Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7592128Z 
2025-12-04T10:35:19.7592740Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7592757Z 
2025-12-04T10:35:19.7592761Z 
2025-12-04T10:35:19.7592949Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7593726Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7593731Z 
2025-12-04T10:35:19.7593978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7594168Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7594268Z frames [('total', 1)]
2025-12-04T10:35:19.7594368Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7594581Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7594782Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7594867Z graph_break []
2025-12-04T10:35:19.7595054Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7595157Z frames [('total', 1)]
2025-12-04T10:35:19.7595256Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7595470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7595709Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7595799Z graph_break []
2025-12-04T10:35:19.7595935Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7596215Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7596319Z Traceback (most recent call last):
2025-12-04T10:35:19.7596713Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7597000Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7597420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7597649Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7598085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7598266Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7598713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7598837Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7599299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7599575Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7600041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7600164Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7600580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7600699Z     return self._compile_to_module()
2025-12-04T10:35:19.7601113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7601249Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7601711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7601825Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7602259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7602462Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7602972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7603089Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7603612Z   File "/tmp/tmpfhn5ysyr/tc/ctcm6vvqjjfjontvq47nev7ixgj7avam3r4r7ncj4rlc6mie2y2m.py", line 163, in <module>
2025-12-04T10:35:19.7604030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7604125Z     kernel.precompile(
2025-12-04T10:35:19.7604605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7604718Z     self._precompile_worker()
2025-12-04T10:35:19.7605230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7605391Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7605949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7606130Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7606538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7606746Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7607125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7607429Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7607712Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7608220Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7608295Z ^
2025-12-04T10:35:19.7608688Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7608693Z 
2025-12-04T10:35:19.7609315Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7609319Z 
2025-12-04T10:35:19.7609323Z 
2025-12-04T10:35:19.7609512Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7610218Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7610228Z 
2025-12-04T10:35:19.7610464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7610647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7610743Z frames [('total', 1)]
2025-12-04T10:35:19.7610846Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7611063Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7611258Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7611343Z graph_break []
2025-12-04T10:35:19.7611539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7611632Z frames [('total', 1)]
2025-12-04T10:35:19.7611731Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7611934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7612127Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7612216Z graph_break []
2025-12-04T10:35:19.7612408Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7612495Z frames [('total', 1)]
2025-12-04T10:35:19.7612601Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7612784Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7612985Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7613080Z graph_break []
2025-12-04T10:35:19.7613798Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml -
2025-12-04T10:35:19.7613946Z =========================== short test summary info ============================
2025-12-04T10:35:19.7614630Z FAILED [0.4405s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7614950Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7615042Z ^
2025-12-04T10:35:19.7615460Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7615466Z 
2025-12-04T10:35:19.7616111Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7616123Z 
2025-12-04T10:35:19.7616130Z 
2025-12-04T10:35:19.7616321Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7617010Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7617014Z 
2025-12-04T10:35:19.7617258Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7617519Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7617701Z =================== 1 failed, 2 deselected, 2 rerun in 2.97s ===================
2025-12-04T10:35:19.7617783Z Got exit code 1
2025-12-04T10:35:19.7617874Z Retrying single test...
2025-12-04T10:35:19.7618299Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml
2025-12-04T10:35:19.7618443Z ============================= test session starts ==============================
2025-12-04T10:35:19.7618750Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7618856Z cachedir: .pytest_cache
2025-12-04T10:35:19.7619373Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7619492Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7619590Z configfile: pytest.ini
2025-12-04T10:35:19.7620054Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7620254Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.7620868Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7620961Z Running 1 items in this shard
2025-12-04T10:35:19.7620973Z 
2025-12-04T10:35:19.7621972Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7622662Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7623060Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7623530Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7624011Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7624586Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7624961Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7625475Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7625923Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7626407Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7626838Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7627247Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7627633Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7628122Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7628580Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7629059Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7629503Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7629983Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7630292Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7631947Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7632409Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7633321Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7633854Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7634634Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7635227Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7636033Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7636794Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7637325Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7638017Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7638329Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7639108Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7639226Z ('RERUN', {'yellow': True}) [2.0392s] [100%]
2025-12-04T10:35:19.7640217Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7640905Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7641372Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7641843Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7642320Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7642816Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7643184Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7643695Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7644153Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7644629Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7645079Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7645474Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7645903Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7646399Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7646783Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7647280Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7647726Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7648272Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7648597Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7650225Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7650700Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7651598Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7652158Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7653000Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7653590Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7654355Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7655007Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7655539Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7696191Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7696662Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7697540Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7697694Z ('RERUN', {'yellow': True}) [0.4419s] [100%]
2025-12-04T10:35:19.7698948Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7699742Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7700149Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7700643Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7701385Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7701868Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7702231Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7702731Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7703178Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7703638Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7704071Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7704463Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7704836Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7705321Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7705772Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7706253Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7706698Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7707162Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7707461Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7709580Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7710045Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7710943Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7711481Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7712241Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7712943Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7713847Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7714508Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7715023Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7715754Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7716062Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7716825Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7716920Z FAILED [0.4429s] [100%]
2025-12-04T10:35:19.7716925Z 
2025-12-04T10:35:19.7717048Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7717327Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7717437Z Traceback (most recent call last):
2025-12-04T10:35:19.7717964Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7718174Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7718587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7718799Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7719249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7719411Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7719845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7719972Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7720424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7720715Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7721154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7721278Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7721700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7721798Z     return self._compile_to_module()
2025-12-04T10:35:19.7722214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7722350Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7722790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7722913Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7723332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7723526Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7724027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7724216Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7724678Z   File "/tmp/tmplkzjjexn/ug/cugurnnkcfghzbzzd3fafveiff4uhmjrkd4vn7ysnlwpanfbujj6.py", line 163, in <module>
2025-12-04T10:35:19.7725070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7725169Z     kernel.precompile(
2025-12-04T10:35:19.7725697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7725798Z     self._precompile_worker()
2025-12-04T10:35:19.7726310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7726461Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7726964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7727136Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7727511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7727718Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7728098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7728461Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7728657Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7728956Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7729024Z ^
2025-12-04T10:35:19.7729420Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7729425Z 
2025-12-04T10:35:19.7730039Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7730044Z 
2025-12-04T10:35:19.7730048Z 
2025-12-04T10:35:19.7730238Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7730924Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7730934Z 
2025-12-04T10:35:19.7731162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7731346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7731435Z frames [('total', 1)]
2025-12-04T10:35:19.7731538Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7731733Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7731921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7732009Z graph_break []
2025-12-04T10:35:19.7732283Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7732396Z Traceback (most recent call last):
2025-12-04T10:35:19.7732779Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7732990Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7733408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7733613Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7734047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7734296Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7734728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7734859Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7735315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7735640Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7736085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7736211Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7736621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7736721Z     return self._compile_to_module()
2025-12-04T10:35:19.7737135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7737275Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7737713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7737818Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7738317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7738513Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7739022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7739174Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7739596Z   File "/tmp/tmp22hhy_li/yb/cybumfw22y3yq23jtnnhbvispu7667uveuil3ivdjynahedb4qvv.py", line 163, in <module>
2025-12-04T10:35:19.7739996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7740084Z     kernel.precompile(
2025-12-04T10:35:19.7740565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7740665Z     self._precompile_worker()
2025-12-04T10:35:19.7741182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7741335Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7741840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7742004Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7742398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7742606Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7742982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7743266Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7743464Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7743776Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7743846Z ^
2025-12-04T10:35:19.7744236Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7744241Z 
2025-12-04T10:35:19.7744931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7744937Z 
2025-12-04T10:35:19.7744941Z 
2025-12-04T10:35:19.7745120Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7745859Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7745868Z 
2025-12-04T10:35:19.7746095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7746282Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7746372Z frames [('total', 1)]
2025-12-04T10:35:19.7746466Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7746666Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7746849Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7746937Z graph_break []
2025-12-04T10:35:19.7747120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7747207Z frames [('total', 1)]
2025-12-04T10:35:19.7747310Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7747491Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7747682Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7747850Z graph_break []
2025-12-04T10:35:19.7747969Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7748241Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7748346Z Traceback (most recent call last):
2025-12-04T10:35:19.7748723Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7748937Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7749358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7749572Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7750016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7750176Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7750621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7750745Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7751195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7751477Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7751922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7752048Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7752467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7752567Z     return self._compile_to_module()
2025-12-04T10:35:19.7752979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7753121Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7753556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7753669Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7754089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7754395Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7754897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7755003Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7755430Z   File "/tmp/tmp_82ow0f2/q7/cq7ppvjv4btm7rjw7xmfl7sytnqxbsrzcio55evixvrdjwqjjdiy.py", line 163, in <module>
2025-12-04T10:35:19.7755827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7755920Z     kernel.precompile(
2025-12-04T10:35:19.7756394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7756488Z     self._precompile_worker()
2025-12-04T10:35:19.7757000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7757152Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7757654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7757827Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7758204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7758494Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7758869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7759149Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7759346Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7759655Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7759728Z ^
2025-12-04T10:35:19.7760123Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7760128Z 
2025-12-04T10:35:19.7760737Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7760750Z 
2025-12-04T10:35:19.7760753Z 
2025-12-04T10:35:19.7760945Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7761632Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7761637Z 
2025-12-04T10:35:19.7761873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7762063Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7762149Z frames [('total', 1)]
2025-12-04T10:35:19.7762253Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7762450Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7762638Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7762733Z graph_break []
2025-12-04T10:35:19.7762920Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7763009Z frames [('total', 1)]
2025-12-04T10:35:19.7763107Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7763292Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7763496Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7763580Z graph_break []
2025-12-04T10:35:19.7763757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7763936Z frames [('total', 1)]
2025-12-04T10:35:19.7764030Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7764214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7764416Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7764493Z graph_break []
2025-12-04T10:35:19.7765061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml -
2025-12-04T10:35:19.7765207Z =========================== short test summary info ============================
2025-12-04T10:35:19.7765931Z FAILED [0.4429s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7766236Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7766309Z ^
2025-12-04T10:35:19.7766714Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7766719Z 
2025-12-04T10:35:19.7767325Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7767330Z 
2025-12-04T10:35:19.7767334Z 
2025-12-04T10:35:19.7767520Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7768289Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7768294Z 
2025-12-04T10:35:19.7768519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7768683Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7768854Z ================== 1 failed, 187 deselected, 2 rerun in 2.96s ==================
2025-12-04T10:35:19.7768936Z Got exit code 1
2025-12-04T10:35:19.7769027Z Retrying single test...
2025-12-04T10:35:19.7769426Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml
2025-12-04T10:35:19.7769568Z ============================= test session starts ==============================
2025-12-04T10:35:19.7769864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7769961Z cachedir: .pytest_cache
2025-12-04T10:35:19.7770415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7770520Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7770609Z configfile: pytest.ini
2025-12-04T10:35:19.7771075Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7771264Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.7771879Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7771976Z Running 1 items in this shard
2025-12-04T10:35:19.7771981Z 
2025-12-04T10:35:19.7772976Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7773670Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7774124Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7774586Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7775056Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7775544Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7775964Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7776460Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7776908Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7777375Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7777807Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7778199Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7778648Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7779192Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7779559Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7780043Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7780482Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7780941Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7781254Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7782886Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7783346Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7784237Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7784775Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7785581Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7786239Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7786986Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7787636Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7788160Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7788829Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7789138Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7789899Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7790010Z ('RERUN', {'yellow': True}) [2.0518s] [100%]
2025-12-04T10:35:19.7791078Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7791747Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7792127Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7792583Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7793059Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7793537Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7793895Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7794394Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7794840Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7795314Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7795744Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7796135Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7796507Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7796980Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7797355Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7797932Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7798386Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7798842Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7799147Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7800783Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7801239Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7802137Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7802750Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7803512Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7804095Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7804845Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7805537Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7806068Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7806755Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7807061Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7807959Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7808073Z ('RERUN', {'yellow': True}) [0.4423s] [100%]
2025-12-04T10:35:19.7809062Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.7809738Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7810231Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 40960
2025-12-04T10:35:19.7810732Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7811236Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.7811760Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.7812150Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.7812684Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.7813168Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7813664Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.7814137Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7814651Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.7815024Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.7815518Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.7815920Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.7816412Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.7816852Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.7817309Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.7817617Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7819303Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7819765Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7820650Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7821193Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7822030Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7822611Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7823355Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7824012Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7824533Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7825206Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7825516Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7826274Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7826443Z FAILED [0.4427s] [100%]
2025-12-04T10:35:19.7826448Z 
2025-12-04T10:35:19.7826567Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7826842Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7826949Z Traceback (most recent call last):
2025-12-04T10:35:19.7827331Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7827545Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7827964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7828173Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7828613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7828779Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7829213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7829339Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7829794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7830071Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7830515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7830636Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7831047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7831148Z     return self._compile_to_module()
2025-12-04T10:35:19.7831571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7831706Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7832144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7832259Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7832759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7832959Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7833460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7833564Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7834011Z   File "/tmp/tmpmpmtyyg1/od/codwrqcbdntqen3knoeeafd6qjno45k4qvwyjg6fbt2te2lvy5gk.py", line 163, in <module>
2025-12-04T10:35:19.7834407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7834501Z     kernel.precompile(
2025-12-04T10:35:19.7834980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7835076Z     self._precompile_worker()
2025-12-04T10:35:19.7835729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7835879Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7836383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7836555Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7837047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7837248Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7837623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7837907Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7838105Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7838413Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7838485Z ^
2025-12-04T10:35:19.7838881Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7838886Z 
2025-12-04T10:35:19.7839493Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7839504Z 
2025-12-04T10:35:19.7839508Z 
2025-12-04T10:35:19.7839700Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7840389Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7840394Z 
2025-12-04T10:35:19.7840629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7840809Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7840895Z frames [('total', 1)]
2025-12-04T10:35:19.7841000Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7841203Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7841387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7841482Z graph_break []
2025-12-04T10:35:19.7841756Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7841862Z Traceback (most recent call last):
2025-12-04T10:35:19.7842250Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7842453Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7842953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7843166Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7843600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7843768Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7844200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7844332Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7844784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7845059Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7845522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7845660Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7846091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7846199Z     return self._compile_to_module()
2025-12-04T10:35:19.7846609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7846829Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7847265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7847371Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7847792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7847985Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7848493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7848600Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7849040Z   File "/tmp/tmprcltxuxc/iw/ciwlj6ht3fp3sbsrqwzcp3tnyqgfl7zs5nrmmycc3hh66kupfm2e.py", line 163, in <module>
2025-12-04T10:35:19.7849436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7849530Z     kernel.precompile(
2025-12-04T10:35:19.7849998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7850102Z     self._precompile_worker()
2025-12-04T10:35:19.7850609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7850769Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7851272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7851439Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7851826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7852035Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7852416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7852700Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7852895Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7853208Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7853367Z ^
2025-12-04T10:35:19.7853759Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7853770Z 
2025-12-04T10:35:19.7854378Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7854384Z 
2025-12-04T10:35:19.7854392Z 
2025-12-04T10:35:19.7854576Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7855275Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7855280Z 
2025-12-04T10:35:19.7855504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7855694Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7855784Z frames [('total', 1)]
2025-12-04T10:35:19.7855881Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7856088Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7856272Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7856352Z graph_break []
2025-12-04T10:35:19.7856534Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7856695Z frames [('total', 1)]
2025-12-04T10:35:19.7856793Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7856980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7857173Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7857262Z graph_break []
2025-12-04T10:35:19.7857382Z =================================== FAILURES ===================================
2025-12-04T10:35:19.7857655Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _
2025-12-04T10:35:19.7857768Z Traceback (most recent call last):
2025-12-04T10:35:19.7858149Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7858363Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7858778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7858991Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7859489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7859653Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7860083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7860213Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7860673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7860950Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7861388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7861518Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7861930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7862029Z     return self._compile_to_module()
2025-12-04T10:35:19.7862449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7862586Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7863115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7863232Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7863650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7863846Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7864358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7864467Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7864929Z   File "/tmp/tmpub9pboc1/mh/cmhc5lgbpxu6y6kvpy4pvjjbgwojfgoaowpoqn6xducagndxdhxr.py", line 163, in <module>
2025-12-04T10:35:19.7865347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7865446Z     kernel.precompile(
2025-12-04T10:35:19.7865952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7866048Z     self._precompile_worker()
2025-12-04T10:35:19.7866563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7866715Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7867384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7867560Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7867942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7868165Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7868549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7868837Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7869047Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7869352Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7869430Z ^
2025-12-04T10:35:19.7869824Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7869829Z 
2025-12-04T10:35:19.7870434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7870438Z 
2025-12-04T10:35:19.7870442Z 
2025-12-04T10:35:19.7870634Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7871332Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7871337Z 
2025-12-04T10:35:19.7871576Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7871762Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7871854Z frames [('total', 1)]
2025-12-04T10:35:19.7871957Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7872154Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7872340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7872432Z graph_break []
2025-12-04T10:35:19.7872608Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7872704Z frames [('total', 1)]
2025-12-04T10:35:19.7872800Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7873066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7873273Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7873359Z graph_break []
2025-12-04T10:35:19.7873539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7873642Z frames [('total', 1)]
2025-12-04T10:35:19.7873740Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7873933Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7874146Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.7874230Z graph_break []
2025-12-04T10:35:19.7874817Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml -
2025-12-04T10:35:19.7874967Z =========================== short test summary info ============================
2025-12-04T10:35:19.7875701Z FAILED [0.4427s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7876027Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.7876106Z ^
2025-12-04T10:35:19.7876515Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7876628Z 
2025-12-04T10:35:19.7877241Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7877245Z 
2025-12-04T10:35:19.7877249Z 
2025-12-04T10:35:19.7877442Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7878150Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7878155Z 
2025-12-04T10:35:19.7878386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7878548Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.7878721Z ================== 1 failed, 187 deselected, 2 rerun in 2.97s ==================
2025-12-04T10:35:19.7878807Z Got exit code 1
2025-12-04T10:35:19.7879302Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.7879666Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.7880079Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml
2025-12-04T10:35:19.7880221Z ============================= test session starts ==============================
2025-12-04T10:35:19.7880522Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.7880627Z cachedir: .pytest_cache
2025-12-04T10:35:19.7881083Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.7881202Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.7881296Z configfile: pytest.ini
2025-12-04T10:35:19.7881774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.7881984Z collecting ... collected 188 items / 3 deselected / 185 selected
2025-12-04T10:35:19.7882105Z stepcurrent: skipping 3 already run items.
2025-12-04T10:35:19.7882206Z Running 185 items in this shard
2025-12-04T10:35:19.7882210Z 
2025-12-04T10:35:19.7883379Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7884290Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7884675Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7885063Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.7885471Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7885924Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7886393Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7886908Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7887405Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7887966Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7888358Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.7888900Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.7889360Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7889822Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.7890331Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.7890791Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.7891245Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7891677Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.7892090Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.7892506Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.7893156Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.7893613Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7894115Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7894680Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.7895173Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.7895649Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7896098Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.7896490Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.7896976Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.7897377Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.7897868Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.7898335Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.7898940Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.7899548Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.7900155Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.7900461Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7902461Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7902934Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7903853Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7904388Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7905165Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7905801Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7906555Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7907293Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7907998Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7908911Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7909235Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7910015Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7910129Z ('RERUN', {'yellow': True}) [1.7100s] [  0%]
2025-12-04T10:35:19.7911215Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7912221Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7912581Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7912978Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.7913377Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7913853Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7914314Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7914817Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7915329Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7915804Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7916208Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.7916744Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.7917209Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7917671Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.7918163Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.7918627Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.7919208Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7919645Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.7920054Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.7920451Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.7921107Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.7921549Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7922063Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7922547Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.7923019Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.7923563Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7923971Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.7924366Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.7924856Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.7925243Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.7925791Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.7926265Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.7926878Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.7927364Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.7927983Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.7928297Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7930368Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7930846Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7931740Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7932292Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7933051Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7933644Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7934393Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7935066Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7935659Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7936614Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7936924Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7937700Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7937830Z ('RERUN', {'yellow': True}) [0.2696s] [  0%]
2025-12-04T10:35:19.7938903Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.7939855Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7940219Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.7940609Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.7940997Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.7941457Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.7941978Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.7942622Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.7943406Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.7943902Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.7944293Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.7944845Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.7945294Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.7945821Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.7946316Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.7946777Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.7947238Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.7947741Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.7948166Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.7948565Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.7949217Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.7949662Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.7950162Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.7950664Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.7951137Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.7951590Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.7952011Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.7952397Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.7952904Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.7953291Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.7953785Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.7954247Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.7954923Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.7955440Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.7956061Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.7956380Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.7958372Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.7958841Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.7959810Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7960350Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7961117Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7961695Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7962456Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7963120Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7963648Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.7964547Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7964867Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.7965679Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7965767Z FAILED [0.2672s] [  0%]
2025-12-04T10:35:19.7965780Z 
2025-12-04T10:35:19.7965899Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.7966178Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.7966283Z Traceback (most recent call last):
2025-12-04T10:35:19.7966771Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7966976Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7967399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7967608Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7968064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7968228Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7968662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7968790Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7969251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7969527Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7969982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7970103Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7970594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7970696Z     return self._compile_to_module()
2025-12-04T10:35:19.7971106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7971250Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7971689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7971811Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7972231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7972424Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7972941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7973052Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7973482Z   File "/tmp/tmp2tcz4hf_/r2/cr2e2rloto7skiacnbdby5e3xtqlzcpjwobmouy2pw6iv43ft3p7.py", line 62, in <module>
2025-12-04T10:35:19.7973886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7973977Z     kernel.precompile(
2025-12-04T10:35:19.7974469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7974565Z     self._precompile_worker()
2025-12-04T10:35:19.7975078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7975247Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7975795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7975990Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7976371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7976585Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7976975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7977342Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7977554Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7978077Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7978158Z ^
2025-12-04T10:35:19.7978578Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7978582Z 
2025-12-04T10:35:19.7979318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7979324Z 
2025-12-04T10:35:19.7979327Z 
2025-12-04T10:35:19.7979524Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.7980224Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.7980229Z 
2025-12-04T10:35:19.7980455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.7980646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.7980730Z frames [('total', 1)]
2025-12-04T10:35:19.7980927Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.7981132Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.7981322Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.7986198Z graph_break []
2025-12-04T10:35:19.7986499Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.7986603Z Traceback (most recent call last):
2025-12-04T10:35:19.7987000Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.7987204Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.7987626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.7987835Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.7988266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.7988439Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.7988867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.7988984Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.7989436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.7989716Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.7990156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.7990275Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.7990677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.7990785Z     return self._compile_to_module()
2025-12-04T10:35:19.7991192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.7991332Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.7991772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.7991878Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.7993106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.7993309Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.7993807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.7993922Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.7994351Z   File "/tmp/tmp18ig6t68/y6/cy6tnzq77225ilakmhbf4p42xssnjrdohzdhakzjxu64qimkmlkw.py", line 62, in <module>
2025-12-04T10:35:19.7994748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.7994836Z     kernel.precompile(
2025-12-04T10:35:19.7995306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.7995406Z     self._precompile_worker()
2025-12-04T10:35:19.7995917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.7996065Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.7996566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.7996814Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.7997198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.7997400Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.7997769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.7998050Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.7998244Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.7998764Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.7998833Z ^
2025-12-04T10:35:19.7999218Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.7999231Z 
2025-12-04T10:35:19.7999837Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.7999842Z 
2025-12-04T10:35:19.7999846Z 
2025-12-04T10:35:19.8000025Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8000711Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8000716Z 
2025-12-04T10:35:19.8000937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8001117Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8001199Z frames [('total', 1)]
2025-12-04T10:35:19.8001292Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8001499Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8001683Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8001760Z graph_break []
2025-12-04T10:35:19.8001936Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8002019Z frames [('total', 1)]
2025-12-04T10:35:19.8002112Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8002292Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8002570Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8002654Z graph_break []
2025-12-04T10:35:19.8002768Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8003037Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8003143Z Traceback (most recent call last):
2025-12-04T10:35:19.8003522Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8003729Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8004141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8004345Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8004780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8004943Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8005373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8005520Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8005994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8006372Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8006809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8006927Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8007332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8007436Z     return self._compile_to_module()
2025-12-04T10:35:19.8008082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8008219Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8008657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8008769Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8009184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8009374Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8009870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8009971Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8010406Z   File "/tmp/tmpti48yo5m/ok/cok64jrkydt6lqpflqurrdhle3vr5z4rjecaw6aeine4jc6sejas.py", line 62, in <module>
2025-12-04T10:35:19.8010796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8010884Z     kernel.precompile(
2025-12-04T10:35:19.8011355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8011450Z     self._precompile_worker()
2025-12-04T10:35:19.8011955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8012101Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8012603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8012768Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8013281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8013487Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8013862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8014140Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8014338Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8014852Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8014923Z ^
2025-12-04T10:35:19.8015329Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8015335Z 
2025-12-04T10:35:19.8015977Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8015982Z 
2025-12-04T10:35:19.8015986Z 
2025-12-04T10:35:19.8016169Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8016845Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8016958Z 
2025-12-04T10:35:19.8017182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8017358Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8017439Z frames [('total', 1)]
2025-12-04T10:35:19.8017535Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8017732Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8017920Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8018000Z graph_break []
2025-12-04T10:35:19.8018173Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8018253Z frames [('total', 1)]
2025-12-04T10:35:19.8018347Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8018526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8018726Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8018802Z graph_break []
2025-12-04T10:35:19.8018974Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8019109Z frames [('total', 1)]
2025-12-04T10:35:19.8019202Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8019380Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8019574Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8019659Z graph_break []
2025-12-04T10:35:19.8020310Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml -
2025-12-04T10:35:19.8020507Z =========================== short test summary info ============================
2025-12-04T10:35:19.8021396Z FAILED [0.2672s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8021934Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8022009Z ^
2025-12-04T10:35:19.8022412Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8022418Z 
2025-12-04T10:35:19.8023131Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8023137Z 
2025-12-04T10:35:19.8023140Z 
2025-12-04T10:35:19.8023328Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8024021Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8024030Z 
2025-12-04T10:35:19.8024261Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8024419Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8024590Z =================== 1 failed, 3 deselected, 2 rerun in 2.28s ===================
2025-12-04T10:35:19.8024674Z Got exit code 1
2025-12-04T10:35:19.8024771Z Retrying single test...
2025-12-04T10:35:19.8025183Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml
2025-12-04T10:35:19.8025347Z ============================= test session starts ==============================
2025-12-04T10:35:19.8025679Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8025774Z cachedir: .pytest_cache
2025-12-04T10:35:19.8026230Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8026417Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8026509Z configfile: pytest.ini
2025-12-04T10:35:19.8026979Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8027170Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.8027797Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8027901Z Running 1 items in this shard
2025-12-04T10:35:19.8027906Z 
2025-12-04T10:35:19.8028981Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.8029892Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8030262Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8030652Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.8031049Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8031513Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8031976Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8032482Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8032985Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8033471Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8033944Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.8034584Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.8035034Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8035506Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.8036002Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.8036465Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.8036923Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8037342Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.8037760Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.8038264Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.8038959Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.8039419Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8039955Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8040469Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.8040964Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.8041444Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8041877Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.8042288Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.8042808Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.8043216Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.8043736Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.8044228Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.8044872Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.8045387Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.8046154Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.8046469Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8048471Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8048942Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8049844Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8050567Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8051332Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8051926Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8052682Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8053350Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8053878Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8054783Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8055108Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8055927Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8056045Z ('RERUN', {'yellow': True}) [1.7094s] [100%]
2025-12-04T10:35:19.8057131Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.8058032Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8058480Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8058866Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.8059319Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8059786Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8060253Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8060752Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8061254Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8061734Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8062118Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.8062739Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.8063185Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8063651Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.8064152Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.8064607Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.8065061Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8065510Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.8065951Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.8066350Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.8067001Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.8067445Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8067950Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8068450Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.8068924Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.8069371Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8069867Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.8070261Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.8070755Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.8071148Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.8071640Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.8072105Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.8072713Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.8073205Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.8073810Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.8074197Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8076197Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8076663Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8077560Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8078101Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8078869Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8079452Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8080208Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8080872Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8081397Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8082376Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8082694Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8083461Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8083579Z ('RERUN', {'yellow': True}) [0.2668s] [100%]
2025-12-04T10:35:19.8084649Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.8085547Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8085915Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8086377Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.8086773Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8087231Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8087704Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8088207Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8088705Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8089186Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8089570Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.8090107Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.8090560Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8091023Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.8091520Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.8091978Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.8092432Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8092855Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.8093369Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.8093771Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.8094415Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.8094862Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8095365Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8095853Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.8096339Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.8096786Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8097202Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.8097594Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.8098200Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.8098592Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.8099130Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.8099596Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.8100200Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.8100695Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.8101300Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.8101609Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8103619Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8104084Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8105062Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8105605Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8106370Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8106961Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8107720Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8108793Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8109319Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8110226Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8110661Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8111432Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8111521Z FAILED [0.2661s] [100%]
2025-12-04T10:35:19.8111525Z 
2025-12-04T10:35:19.8111655Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8111942Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8112048Z Traceback (most recent call last):
2025-12-04T10:35:19.8112439Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8112658Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8113079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8113297Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8113738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8113906Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8114350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8114475Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8114940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8115219Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8115679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8115806Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8116222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8116330Z     return self._compile_to_module()
2025-12-04T10:35:19.8116854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8116997Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8117444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8117556Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8117985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8118192Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8118697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8118809Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8119230Z   File "/tmp/tmptkyk_avr/u6/cu6mnqj6wdu6zrod277mtym6qctsmbz7osjsp6k62riedppwvahg.py", line 62, in <module>
2025-12-04T10:35:19.8119639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8119733Z     kernel.precompile(
2025-12-04T10:35:19.8120214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8120316Z     self._precompile_worker()
2025-12-04T10:35:19.8120829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8121067Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8121582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8121752Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8122147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8122357Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8122739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8123031Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8123232Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8123767Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8123843Z ^
2025-12-04T10:35:19.8124239Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8124245Z 
2025-12-04T10:35:19.8124865Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8124870Z 
2025-12-04T10:35:19.8124874Z 
2025-12-04T10:35:19.8125061Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8125804Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8125814Z 
2025-12-04T10:35:19.8126045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8126233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8126323Z frames [('total', 1)]
2025-12-04T10:35:19.8126422Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8126630Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8126823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8126989Z graph_break []
2025-12-04T10:35:19.8127273Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8127379Z Traceback (most recent call last):
2025-12-04T10:35:19.8127766Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8127978Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8128404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8128620Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8129061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8129229Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8129676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8129802Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8130265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8130546Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8131074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8131203Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8131615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8131718Z     return self._compile_to_module()
2025-12-04T10:35:19.8132136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8132280Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8132739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8132857Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8133286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8133496Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8134003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8134117Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8134553Z   File "/tmp/tmpgc4mg235/w4/cw47ikhnavo7czt2ms3l43nhty4ktuivme76puqsb7f7ng4a6gm2.py", line 62, in <module>
2025-12-04T10:35:19.8134964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8135070Z     kernel.precompile(
2025-12-04T10:35:19.8135577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8135702Z     self._precompile_worker()
2025-12-04T10:35:19.8136231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8136392Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8136921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8137092Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8137481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8137808Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8138192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8138490Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8138690Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8139284Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8139367Z ^
2025-12-04T10:35:19.8139766Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8139771Z 
2025-12-04T10:35:19.8140395Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8140404Z 
2025-12-04T10:35:19.8140408Z 
2025-12-04T10:35:19.8140599Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8141289Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8141294Z 
2025-12-04T10:35:19.8141533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8141802Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8141901Z frames [('total', 1)]
2025-12-04T10:35:19.8142000Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8142211Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8142415Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8142504Z graph_break []
2025-12-04T10:35:19.8142697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8142794Z frames [('total', 1)]
2025-12-04T10:35:19.8142896Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8143090Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8143298Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8143386Z graph_break []
2025-12-04T10:35:19.8143518Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8143807Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8143918Z Traceback (most recent call last):
2025-12-04T10:35:19.8144319Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8144532Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8144970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8145188Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8145682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8145861Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8146306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8146436Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8146901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8147182Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8147731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8147865Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8148281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8148389Z     return self._compile_to_module()
2025-12-04T10:35:19.8148811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8148966Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8149414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8149527Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8149963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8150166Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8150673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8150789Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8151228Z   File "/tmp/tmprauk3sv1/zl/czlrzt72phjozf5sfk4zefvsz32rkupmqh72sthun5kmvddyas56.py", line 62, in <module>
2025-12-04T10:35:19.8151634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8151815Z     kernel.precompile(
2025-12-04T10:35:19.8152295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8152401Z     self._precompile_worker()
2025-12-04T10:35:19.8152920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8153081Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8153597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8153773Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8154171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8154388Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8154770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8155066Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8155265Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8155807Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8155884Z ^
2025-12-04T10:35:19.8156281Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8156286Z 
2025-12-04T10:35:19.8156906Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8156916Z 
2025-12-04T10:35:19.8156920Z 
2025-12-04T10:35:19.8157109Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8157806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8157811Z 
2025-12-04T10:35:19.8158042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8158317Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8158408Z frames [('total', 1)]
2025-12-04T10:35:19.8158507Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8158723Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8158920Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8159006Z graph_break []
2025-12-04T10:35:19.8159201Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8159288Z frames [('total', 1)]
2025-12-04T10:35:19.8159389Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8159586Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8159789Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8159879Z graph_break []
2025-12-04T10:35:19.8160064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8160161Z frames [('total', 1)]
2025-12-04T10:35:19.8160263Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8160451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8160656Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8160749Z graph_break []
2025-12-04T10:35:19.8161316Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml -
2025-12-04T10:35:19.8161561Z =========================== short test summary info ============================
2025-12-04T10:35:19.8162233Z FAILED [0.2661s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8162765Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8162854Z ^
2025-12-04T10:35:19.8163255Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8163260Z 
2025-12-04T10:35:19.8163886Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8163896Z 
2025-12-04T10:35:19.8163900Z 
2025-12-04T10:35:19.8164086Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8164779Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8164791Z 
2025-12-04T10:35:19.8165026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8165188Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8165384Z ================== 1 failed, 187 deselected, 2 rerun in 2.28s ==================
2025-12-04T10:35:19.8165489Z Got exit code 1
2025-12-04T10:35:19.8165599Z Retrying single test...
2025-12-04T10:35:19.8166022Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml
2025-12-04T10:35:19.8166167Z ============================= test session starts ==============================
2025-12-04T10:35:19.8166483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8166582Z cachedir: .pytest_cache
2025-12-04T10:35:19.8167036Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8167160Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8167255Z configfile: pytest.ini
2025-12-04T10:35:19.8167812Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8168010Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.8168632Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8168739Z Running 1 items in this shard
2025-12-04T10:35:19.8168751Z 
2025-12-04T10:35:19.8169830Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.8170748Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8171119Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8171513Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.8171916Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8172488Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8172961Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8173463Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8173970Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8174449Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8174837Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.8175392Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.8175845Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8176315Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.8176820Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.8177277Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.8177740Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8178164Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.8178582Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.8178979Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.8179765Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.8180213Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8180716Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8181218Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.8181694Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.8182146Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8182582Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.8182976Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.8183477Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.8183947Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.8184444Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.8184918Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.8185532Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.8186028Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.8186634Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.8187032Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8189041Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8189509Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8190414Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8190953Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8191811Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8192398Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8193161Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8193828Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8194364Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8195297Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8195652Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8196501Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8196621Z ('RERUN', {'yellow': True}) [1.7158s] [100%]
2025-12-04T10:35:19.8197704Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.8198608Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8198981Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8199374Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.8199768Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8200233Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8200706Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8201215Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8201721Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8202209Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8202594Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.8203139Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.8203688Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8204156Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.8204660Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.8205132Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.8205635Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8206065Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.8206484Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.8206892Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.8207544Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.8208207Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8208724Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8209217Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.8209710Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.8210165Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8210588Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.8210994Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.8211490Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.8211894Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.8212393Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.8212870Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.8213476Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.8213973Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.8214584Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.8214894Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8217076Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8217571Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8218530Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8219135Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8219915Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8220638Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8221396Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8222071Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8222600Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8223514Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8223831Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8224606Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8224725Z ('RERUN', {'yellow': True}) [0.2720s] [100%]
2025-12-04T10:35:19.8225851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0
2025-12-04T10:35:19.8226755Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8227129Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8227517Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.8228111Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8228577Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8229043Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8229547Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8230059Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8230533Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8230922Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.8231463Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.8231918Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8232387Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.8232966Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.8233433Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.8233900Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8234326Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.8234751Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.8235154Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.8235871Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.8236315Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8236830Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8237328Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.8237809Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.8238265Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8238688Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.8239083Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = -448.0
2025-12-04T10:35:19.8239579Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.8240052Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = 448.0
2025-12-04T10:35:19.8240553Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.8241020Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.8241636Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask)
2025-12-04T10:35:19.8242127Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.8242740Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None)
2025-12-04T10:35:19.8243056Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8245057Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8245608Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8246504Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8247047Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8247819Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8248410Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8249177Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8249843Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8250372Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8251281Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8251604Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8252450Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8252550Z FAILED [0.2693s] [100%]
2025-12-04T10:35:19.8252555Z 
2025-12-04T10:35:19.8252683Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8252979Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8253091Z Traceback (most recent call last):
2025-12-04T10:35:19.8253481Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8253712Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8254136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8254358Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8254804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8254974Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8255427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8255642Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8256105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8256387Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8256843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8256985Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8257405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8257513Z     return self._compile_to_module()
2025-12-04T10:35:19.8257940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8258087Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8258539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8258655Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8259133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8259349Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8259859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8259968Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8260426Z   File "/tmp/tmp6z14z9by/kv/ckvppj2tnkky6jfblaitlix7vhwddddcua3koq3d4tlnx6m6elm7.py", line 62, in <module>
2025-12-04T10:35:19.8260828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8260940Z     kernel.precompile(
2025-12-04T10:35:19.8261421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8261523Z     self._precompile_worker()
2025-12-04T10:35:19.8262047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8262200Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8262827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8263006Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8263396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8263616Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8264002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8264293Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8264507Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8265038Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8265124Z ^
2025-12-04T10:35:19.8265534Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8265539Z 
2025-12-04T10:35:19.8266157Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8266162Z 
2025-12-04T10:35:19.8266241Z 
2025-12-04T10:35:19.8266435Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8267130Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8267134Z 
2025-12-04T10:35:19.8267375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8267567Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8267665Z frames [('total', 1)]
2025-12-04T10:35:19.8267772Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8267980Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8268183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8268269Z graph_break []
2025-12-04T10:35:19.8268550Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8268669Z Traceback (most recent call last):
2025-12-04T10:35:19.8269066Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8269280Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8269706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8269921Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8270374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8270542Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8270983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8271114Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8271582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8271875Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8272331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8272460Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8272965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8273071Z     return self._compile_to_module()
2025-12-04T10:35:19.8273491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8273637Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8274090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8274214Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8274643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8274848Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8275364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8275477Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8275929Z   File "/tmp/tmpygafdbo0/ct/cctq473uotyp5vzfipgmvvuhwlay5yshfdsmyv6eboiy62zhnwh6.py", line 62, in <module>
2025-12-04T10:35:19.8276329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8276433Z     kernel.precompile(
2025-12-04T10:35:19.8277007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8277108Z     self._precompile_worker()
2025-12-04T10:35:19.8277628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8277788Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8278310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8278486Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8278875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8279092Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8279477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8279771Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8284114Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8284668Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8284744Z ^
2025-12-04T10:35:19.8285156Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8285162Z 
2025-12-04T10:35:19.8285826Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8285831Z 
2025-12-04T10:35:19.8285835Z 
2025-12-04T10:35:19.8286028Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8286727Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8286732Z 
2025-12-04T10:35:19.8286966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8287159Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8287250Z frames [('total', 1)]
2025-12-04T10:35:19.8287354Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8287670Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8287867Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8287958Z graph_break []
2025-12-04T10:35:19.8288144Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8288233Z frames [('total', 1)]
2025-12-04T10:35:19.8288346Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8288537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8288739Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8288827Z graph_break []
2025-12-04T10:35:19.8288958Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8289241Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _
2025-12-04T10:35:19.8289348Z Traceback (most recent call last):
2025-12-04T10:35:19.8289743Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8289960Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8290383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8290601Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8291131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8291299Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8291741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8291868Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8292342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8292626Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8293080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8293213Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8293635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8293740Z     return self._compile_to_module()
2025-12-04T10:35:19.8294158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8294305Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8294755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8294872Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8295301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8295510Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8296060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8296175Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8296598Z   File "/tmp/tmpj_4eqc11/5t/c5ty6ahpawh6bvwevrunlvix5gfgqhxerb56clsai43plrubxokf.py", line 62, in <module>
2025-12-04T10:35:19.8296997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8297099Z     kernel.precompile(
2025-12-04T10:35:19.8297658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8297759Z     self._precompile_worker()
2025-12-04T10:35:19.8298276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8298430Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8298950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8299207Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8299594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8299810Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8300190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8300483Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8300686Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8301214Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8301294Z ^
2025-12-04T10:35:19.8301773Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8301778Z 
2025-12-04T10:35:19.8302391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8302399Z 
2025-12-04T10:35:19.8302403Z 
2025-12-04T10:35:19.8302593Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8303289Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8303294Z 
2025-12-04T10:35:19.8303527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8303714Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8303806Z frames [('total', 1)]
2025-12-04T10:35:19.8303912Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8304119Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8304312Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8304395Z graph_break []
2025-12-04T10:35:19.8304579Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8304675Z frames [('total', 1)]
2025-12-04T10:35:19.8304774Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8304965Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8305170Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8305254Z graph_break []
2025-12-04T10:35:19.8305443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8305530Z frames [('total', 1)]
2025-12-04T10:35:19.8305627Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8305815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8306023Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8306105Z graph_break []
2025-12-04T10:35:19.8306681Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml -
2025-12-04T10:35:19.8306830Z =========================== short test summary info ============================
2025-12-04T10:35:19.8307620Z FAILED [0.2693s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8308309Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.8308387Z ^
2025-12-04T10:35:19.8308789Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8308801Z 
2025-12-04T10:35:19.8309414Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8309419Z 
2025-12-04T10:35:19.8309422Z 
2025-12-04T10:35:19.8309616Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8310305Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8310310Z 
2025-12-04T10:35:19.8310545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8310701Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8310875Z ================== 1 failed, 187 deselected, 2 rerun in 2.29s ==================
2025-12-04T10:35:19.8311090Z Got exit code 1
2025-12-04T10:35:19.8311572Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.8311931Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.8312346Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml
2025-12-04T10:35:19.8312487Z ============================= test session starts ==============================
2025-12-04T10:35:19.8312799Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8312893Z cachedir: .pytest_cache
2025-12-04T10:35:19.8313347Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8313458Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8313551Z configfile: pytest.ini
2025-12-04T10:35:19.8314029Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8314224Z collecting ... collected 188 items / 4 deselected / 184 selected
2025-12-04T10:35:19.8314346Z stepcurrent: skipping 4 already run items.
2025-12-04T10:35:19.8314448Z Running 184 items in this shard
2025-12-04T10:35:19.8314452Z 
2025-12-04T10:35:19.8315499Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8316214Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8316614Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8317088Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8317572Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8318056Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8318539Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8319048Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8319497Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8319978Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8320412Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8320811Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8321191Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8321675Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8322050Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8322612Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8323066Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8323530Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8323840Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8325514Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8326008Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8326909Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8327452Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8328218Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8328806Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8329562Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8330307Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8330831Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8331516Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8331831Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8332600Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8332714Z ('RERUN', {'yellow': True}) [2.0916s] [  0%]
2025-12-04T10:35:19.8333725Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8334404Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8334878Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8335345Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8335823Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8336313Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8336680Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8337184Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8337634Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8338114Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8338550Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8338945Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8339376Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8339863Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8340235Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8340723Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8341169Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8341636Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8342026Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8343664Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8344132Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8345030Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8345611Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8346385Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8347073Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8347826Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8348497Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8349020Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8349699Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8350024Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8350788Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8350903Z ('RERUN', {'yellow': True}) [0.4729s] [  0%]
2025-12-04T10:35:19.8351910Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8352592Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8352991Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8353456Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8353940Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8354504Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8354875Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8355383Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8355834Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8356306Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8356740Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8357148Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8357522Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8358004Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8358384Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8358944Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8359397Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8359861Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8360172Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8361806Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8362272Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8363175Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8363715Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8364480Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8365068Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8365825Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8366564Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8367093Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8367773Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8368088Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8368855Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8368944Z FAILED [0.4722s] [  0%]
2025-12-04T10:35:19.8368952Z 
2025-12-04T10:35:19.8369081Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8369368Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8369474Z Traceback (most recent call last):
2025-12-04T10:35:19.8369866Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8370165Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8370583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8370802Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8371243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8371418Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8371858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8371985Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8372447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8372734Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8373183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8373311Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8373725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8373833Z     return self._compile_to_module()
2025-12-04T10:35:19.8374252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8374397Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8374844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8374954Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8375388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8375589Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8376093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8376206Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8376717Z   File "/tmp/tmpr8r26_lm/6y/c6ykhis2ft6fc7sjdns64at5bavcwegprgynyfqkhmobcqcs532z.py", line 168, in <module>
2025-12-04T10:35:19.8377125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8377220Z     kernel.precompile(
2025-12-04T10:35:19.8377697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8377800Z     self._precompile_worker()
2025-12-04T10:35:19.8378318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8378474Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8378987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8379208Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8379601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8379813Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8380194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8380488Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8380769Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8381079Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8381155Z ^
2025-12-04T10:35:19.8381551Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8381556Z 
2025-12-04T10:35:19.8382182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8382187Z 
2025-12-04T10:35:19.8382190Z 
2025-12-04T10:35:19.8382378Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8383080Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8383090Z 
2025-12-04T10:35:19.8383320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8383510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8383598Z frames [('total', 1)]
2025-12-04T10:35:19.8383696Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8383903Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8384094Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8384178Z graph_break []
2025-12-04T10:35:19.8384472Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8384578Z Traceback (most recent call last):
2025-12-04T10:35:19.8384965Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8385177Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8385645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8385867Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8386309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8386475Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8386999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8387126Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8387588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8387867Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8388314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8388450Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8388862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8388967Z     return self._compile_to_module()
2025-12-04T10:35:19.8389391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8389536Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8389982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8390094Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8390518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8390821Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8391327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8391439Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8391884Z   File "/tmp/tmpb7zf3v9d/3c/c3cdzfioke7fv46octmqsd53fsmncxohaogbdcg6zem3d4r5omkj.py", line 168, in <module>
2025-12-04T10:35:19.8392287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8392384Z     kernel.precompile(
2025-12-04T10:35:19.8392863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8392963Z     self._precompile_worker()
2025-12-04T10:35:19.8393478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8393639Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8394155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8394327Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8394712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8394929Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8395309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8395605Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8395801Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8396109Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8396191Z ^
2025-12-04T10:35:19.8396586Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8396591Z 
2025-12-04T10:35:19.8397212Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8397217Z 
2025-12-04T10:35:19.8397220Z 
2025-12-04T10:35:19.8397490Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8398188Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8398197Z 
2025-12-04T10:35:19.8398428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8398618Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8398709Z frames [('total', 1)]
2025-12-04T10:35:19.8398807Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8399012Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8399206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8399290Z graph_break []
2025-12-04T10:35:19.8399473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8399564Z frames [('total', 1)]
2025-12-04T10:35:19.8399665Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8399857Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8400055Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8400138Z graph_break []
2025-12-04T10:35:19.8400268Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8400635Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8400741Z Traceback (most recent call last):
2025-12-04T10:35:19.8401133Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8401341Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8401763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8401981Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8402423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8402591Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8403031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8403167Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8403630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8403907Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8404356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8404486Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8404898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8405004Z     return self._compile_to_module()
2025-12-04T10:35:19.8405445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8405612Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8406062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8406174Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8406601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8406799Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8407474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8407588Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8408201Z   File "/tmp/tmp0cdw70e8/fz/cfz4ycz2ldx27axtnofwsit4udseotqt5wvd7v6n6qkkfar4rkj3.py", line 168, in <module>
2025-12-04T10:35:19.8408603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8408703Z     kernel.precompile(
2025-12-04T10:35:19.8409180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8409282Z     self._precompile_worker()
2025-12-04T10:35:19.8409795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8409950Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8410468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8410638Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8411024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8411234Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8411733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8412025Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8412222Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8412531Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8412605Z ^
2025-12-04T10:35:19.8413004Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8413008Z 
2025-12-04T10:35:19.8413624Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8413629Z 
2025-12-04T10:35:19.8413632Z 
2025-12-04T10:35:19.8413818Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8414525Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8414530Z 
2025-12-04T10:35:19.8414759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8414947Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8415036Z frames [('total', 1)]
2025-12-04T10:35:19.8415138Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8415345Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8415570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8415667Z graph_break []
2025-12-04T10:35:19.8415868Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8415964Z frames [('total', 1)]
2025-12-04T10:35:19.8416069Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8416261Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8416460Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8416549Z graph_break []
2025-12-04T10:35:19.8416731Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8416820Z frames [('total', 1)]
2025-12-04T10:35:19.8416919Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8417227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8417429Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8417522Z graph_break []
2025-12-04T10:35:19.8418089Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml -
2025-12-04T10:35:19.8418239Z =========================== short test summary info ============================
2025-12-04T10:35:19.8418931Z FAILED [0.4722s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8419309Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8419393Z ^
2025-12-04T10:35:19.8419790Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8419794Z 
2025-12-04T10:35:19.8420416Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8420420Z 
2025-12-04T10:35:19.8420424Z 
2025-12-04T10:35:19.8420613Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8421308Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8421401Z 
2025-12-04T10:35:19.8421633Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8421791Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8421967Z =================== 1 failed, 4 deselected, 2 rerun in 3.07s ===================
2025-12-04T10:35:19.8422051Z Got exit code 1
2025-12-04T10:35:19.8422143Z Retrying single test...
2025-12-04T10:35:19.8422569Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml
2025-12-04T10:35:19.8422710Z ============================= test session starts ==============================
2025-12-04T10:35:19.8423015Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8423109Z cachedir: .pytest_cache
2025-12-04T10:35:19.8423571Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8423682Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8423777Z configfile: pytest.ini
2025-12-04T10:35:19.8424244Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8424440Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.8425072Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8425178Z Running 1 items in this shard
2025-12-04T10:35:19.8425183Z 
2025-12-04T10:35:19.8426199Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8426891Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8427290Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8427869Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8428357Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8428841Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8429226Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8429730Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8430178Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8430660Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8431096Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8431495Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8431871Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8432437Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8432818Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8433301Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8433760Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8434226Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8434532Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8436236Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8436706Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8437606Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8438146Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8438919Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8439502Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8440338Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8441001Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8441531Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8442217Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8442526Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8443296Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8443410Z ('RERUN', {'yellow': True}) [2.0976s] [100%]
2025-12-04T10:35:19.8444417Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8445176Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8445623Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8446094Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8446572Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8447058Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8447443Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8447946Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8448396Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8448870Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8449309Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8449707Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8450081Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8450571Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8450944Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8451507Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8451958Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8452432Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8452740Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8454387Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8454853Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8455797Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8456416Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8457178Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8457768Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8458523Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8459230Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8459760Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8460446Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8460765Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8461530Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8461646Z ('RERUN', {'yellow': True}) [0.4733s] [100%]
2025-12-04T10:35:19.8462651Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8463346Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8463746Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8464294Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8464777Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8465261Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8465685Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8466188Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8466634Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8467114Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8467549Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8467957Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8468409Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8468897Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8469275Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8469759Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8470221Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8470685Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8470990Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8472645Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8473108Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8474012Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8474553Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8475321Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8476061Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8476826Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8477485Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8478014Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8478697Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8479009Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8479781Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8479873Z FAILED [0.4697s] [100%]
2025-12-04T10:35:19.8479878Z 
2025-12-04T10:35:19.8480088Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8480377Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8480484Z Traceback (most recent call last):
2025-12-04T10:35:19.8480883Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8481096Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8481523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8481743Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8482185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8482355Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8482808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8482932Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8483395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8483675Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8484137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8484265Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8484678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8484785Z     return self._compile_to_module()
2025-12-04T10:35:19.8485206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8485349Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8485800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8485916Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8486352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8486637Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8487145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8487257Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8487712Z   File "/tmp/tmpvrzblr75/si/csivzh63avudnamvcpszbph2ousqhcey6f465tkdhy7opfovkr7p.py", line 168, in <module>
2025-12-04T10:35:19.8488123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8488220Z     kernel.precompile(
2025-12-04T10:35:19.8488703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8488809Z     self._precompile_worker()
2025-12-04T10:35:19.8489323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8489484Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8490004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8490175Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8490563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8490854Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8491237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8491529Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8491729Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8492042Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8492119Z ^
2025-12-04T10:35:19.8492514Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8492519Z 
2025-12-04T10:35:19.8493137Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8493150Z 
2025-12-04T10:35:19.8493154Z 
2025-12-04T10:35:19.8493347Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8494050Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8494055Z 
2025-12-04T10:35:19.8494286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8494477Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8494570Z frames [('total', 1)]
2025-12-04T10:35:19.8494671Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8494888Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8495078Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8495161Z graph_break []
2025-12-04T10:35:19.8495481Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8495619Z Traceback (most recent call last):
2025-12-04T10:35:19.8496007Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8496220Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8496639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8496943Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8497388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8497554Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8498002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8498132Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8498597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8498873Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8499364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8499496Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8499915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8500024Z     return self._compile_to_module()
2025-12-04T10:35:19.8500444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8500588Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8501118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8501229Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8501656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8501858Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8502368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8502481Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8502934Z   File "/tmp/tmpi5dm35l0/p7/cp7v7zg5ov627f6dhxzehuuaoxtmo3ncyq2l7b25xaaenz3dsex2.py", line 168, in <module>
2025-12-04T10:35:19.8503334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8503440Z     kernel.precompile(
2025-12-04T10:35:19.8503920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8504021Z     self._precompile_worker()
2025-12-04T10:35:19.8504545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8504698Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8505220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8505416Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8505830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8506043Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8506427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8506717Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8506915Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8507225Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8507304Z ^
2025-12-04T10:35:19.8508033Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8508040Z 
2025-12-04T10:35:19.8508667Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8508672Z 
2025-12-04T10:35:19.8508676Z 
2025-12-04T10:35:19.8508864Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8509572Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8509577Z 
2025-12-04T10:35:19.8509812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8509999Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8510094Z frames [('total', 1)]
2025-12-04T10:35:19.8510192Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8510403Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8510600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8510684Z graph_break []
2025-12-04T10:35:19.8510870Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8510963Z frames [('total', 1)]
2025-12-04T10:35:19.8511063Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8511397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8511598Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8511681Z graph_break []
2025-12-04T10:35:19.8511821Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8512109Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8512214Z Traceback (most recent call last):
2025-12-04T10:35:19.8512612Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8512822Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8513247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8513461Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8513909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8514082Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8514521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8514650Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8515122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8515400Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8515849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8515976Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8516393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8516502Z     return self._compile_to_module()
2025-12-04T10:35:19.8516918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8517063Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8517508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8517721Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8518151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8518350Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8518857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8518981Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8519426Z   File "/tmp/tmpjq4t5iry/52/c523zshaeih26kv6egrdta67mhvi4uarmtxupydf5nsgcc5rtvf5.py", line 168, in <module>
2025-12-04T10:35:19.8519836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8519930Z     kernel.precompile(
2025-12-04T10:35:19.8520414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8520522Z     self._precompile_worker()
2025-12-04T10:35:19.8521037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8521192Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8521706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8521960Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8522351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8522562Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8522943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8523249Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8523447Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8523760Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8523836Z ^
2025-12-04T10:35:19.8524275Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8524288Z 
2025-12-04T10:35:19.8525088Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8525095Z 
2025-12-04T10:35:19.8525100Z 
2025-12-04T10:35:19.8525354Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8526205Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8526210Z 
2025-12-04T10:35:19.8526447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8526641Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8526735Z frames [('total', 1)]
2025-12-04T10:35:19.8526834Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8527045Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8527235Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8527324Z graph_break []
2025-12-04T10:35:19.8527513Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8527601Z frames [('total', 1)]
2025-12-04T10:35:19.8527700Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8527893Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8528199Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8528287Z graph_break []
2025-12-04T10:35:19.8528470Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8528557Z frames [('total', 1)]
2025-12-04T10:35:19.8528656Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8528844Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8529055Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8529143Z graph_break []
2025-12-04T10:35:19.8529709Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml -
2025-12-04T10:35:19.8529861Z =========================== short test summary info ============================
2025-12-04T10:35:19.8530558Z FAILED [0.4697s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8530870Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8530949Z ^
2025-12-04T10:35:19.8531344Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8531348Z 
2025-12-04T10:35:19.8531964Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8532047Z 
2025-12-04T10:35:19.8532050Z 
2025-12-04T10:35:19.8532239Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8532933Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8532942Z 
2025-12-04T10:35:19.8533179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8533341Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8533519Z ================== 1 failed, 187 deselected, 2 rerun in 3.08s ==================
2025-12-04T10:35:19.8533604Z Got exit code 1
2025-12-04T10:35:19.8533697Z Retrying single test...
2025-12-04T10:35:19.8534110Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml
2025-12-04T10:35:19.8534259Z ============================= test session starts ==============================
2025-12-04T10:35:19.8534571Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8534668Z cachedir: .pytest_cache
2025-12-04T10:35:19.8535124Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8535243Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8535336Z configfile: pytest.ini
2025-12-04T10:35:19.8535851Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8536049Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.8536675Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8536783Z Running 1 items in this shard
2025-12-04T10:35:19.8536788Z 
2025-12-04T10:35:19.8537801Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8538578Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8538983Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8539506Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8540000Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8540484Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8540856Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8541365Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8541813Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8542285Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8542798Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8543200Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8543577Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8544067Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8544441Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8544923Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8545399Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8545899Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8546207Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8547857Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8548321Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8549235Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8549776Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8550634Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8551219Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8551978Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8552646Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8553174Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8553864Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8554177Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8554946Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8555167Z ('RERUN', {'yellow': True}) [2.0953s] [100%]
2025-12-04T10:35:19.8559762Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8560481Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8560881Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8561351Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8561841Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8562327Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8562695Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8563205Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8563657Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8564127Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8564568Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8564968Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8565347Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8566002Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8566374Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8566861Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8567315Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8567779Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8568097Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8569744Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8570293Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8571190Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8571730Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8572496Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8573082Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8573843Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8574507Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8575038Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8575724Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8576038Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8576867Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8577029Z ('RERUN', {'yellow': True}) [0.4735s] [100%]
2025-12-04T10:35:19.8578382Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2
2025-12-04T10:35:19.8579163Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8579565Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 33554432
2025-12-04T10:35:19.8580029Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8580516Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:19.8581000Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:19.8581377Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:19.8581880Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32)
2025-12-04T10:35:19.8582327Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8582801Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.broadcast_to(tmp2, [XBLOCK])
2025-12-04T10:35:19.8583397Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:19.8583799Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp1 * tmp3
2025-12-04T10:35:19.8584174Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = -448.0
2025-12-04T10:35:19.8584662Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = triton_helpers.maximum(tmp4, tmp5)
2025-12-04T10:35:19.8585036Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 448.0
2025-12-04T10:35:19.8585568Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = triton_helpers.minimum(tmp6, tmp7)
2025-12-04T10:35:19.8586024Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp8.to(tl.float8e4nv)
2025-12-04T10:35:19.8586487Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp9, None)
2025-12-04T10:35:19.8586793Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8588442Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8588907Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8589805Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8590421Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8591187Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8591868Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8592638Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8593299Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8593827Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8594510Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8594818Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8595676Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8595768Z FAILED [0.4719s] [100%]
2025-12-04T10:35:19.8595773Z 
2025-12-04T10:35:19.8595903Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8596192Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8596307Z Traceback (most recent call last):
2025-12-04T10:35:19.8596701Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8596914Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8597335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8597564Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8598008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8598177Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8598617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8598748Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8599213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8599492Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8599945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8600077Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8600491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8600598Z     return self._compile_to_module()
2025-12-04T10:35:19.8601015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8601156Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8601715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8601831Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8602262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8602462Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8602976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8603089Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8603539Z   File "/tmp/tmpaa2nfmhq/kh/ckhxuv4xjnpquidpif7ji5k5ymvqhoaqeczyem62y5j6oxxc6j5y.py", line 168, in <module>
2025-12-04T10:35:19.8603943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8604037Z     kernel.precompile(
2025-12-04T10:35:19.8604523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8604626Z     self._precompile_worker()
2025-12-04T10:35:19.8605142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8605300Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8605947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8606119Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8606513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8606724Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8607110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8607403Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8607601Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8608113Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8608198Z ^
2025-12-04T10:35:19.8608597Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8608602Z 
2025-12-04T10:35:19.8609219Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8609224Z 
2025-12-04T10:35:19.8609228Z 
2025-12-04T10:35:19.8609414Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8610125Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8610131Z 
2025-12-04T10:35:19.8610362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8610551Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8610644Z frames [('total', 1)]
2025-12-04T10:35:19.8610745Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8610954Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8611150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8611236Z graph_break []
2025-12-04T10:35:19.8611528Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8611634Z Traceback (most recent call last):
2025-12-04T10:35:19.8612155Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8612369Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8612788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8613010Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8613458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8613625Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8614068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8614193Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8614658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8614945Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8615422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8615568Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8615991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8616204Z     return self._compile_to_module()
2025-12-04T10:35:19.8616623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8616765Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8617215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8617332Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8617760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8617963Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8618471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8618590Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8619082Z   File "/tmp/tmp9xykmy7e/z3/cz3bcryjjzoh3mc6awt2xnjrmwj3qds4eckfabakw2c4gjwbjwdt.py", line 168, in <module>
2025-12-04T10:35:19.8619483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8619580Z     kernel.precompile(
2025-12-04T10:35:19.8620058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8620165Z     self._precompile_worker()
2025-12-04T10:35:19.8620682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8620835Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8621352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8621528Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8621914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8622127Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8622512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8622893Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8623094Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8623405Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8623488Z ^
2025-12-04T10:35:19.8623884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8623894Z 
2025-12-04T10:35:19.8624508Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8624515Z 
2025-12-04T10:35:19.8624519Z 
2025-12-04T10:35:19.8624705Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8625436Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8625442Z 
2025-12-04T10:35:19.8625695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8625883Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8625974Z frames [('total', 1)]
2025-12-04T10:35:19.8626073Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8626277Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8626552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8626637Z graph_break []
2025-12-04T10:35:19.8626821Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8626911Z frames [('total', 1)]
2025-12-04T10:35:19.8627008Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8627196Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8627405Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8627488Z graph_break []
2025-12-04T10:35:19.8627617Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8627904Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:19.8628011Z Traceback (most recent call last):
2025-12-04T10:35:19.8628404Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant
2025-12-04T10:35:19.8628618Z     y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:19.8629041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8629258Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8629702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8629879Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8630320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8630445Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8630912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8631195Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8631645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8631773Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8632188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8632295Z     return self._compile_to_module()
2025-12-04T10:35:19.8632796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8632938Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8633387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8633498Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8633935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8634136Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8634642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8634755Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8635205Z   File "/tmp/tmp8qn6ym91/rw/crwjyf4tpwijz7kinlm7r5t4ht7vt25aedwffhropawklw6k7ies.py", line 168, in <module>
2025-12-04T10:35:19.8635662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8635756Z     kernel.precompile(
2025-12-04T10:35:19.8636235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8636418Z     self._precompile_worker()
2025-12-04T10:35:19.8636934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8637087Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8637603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8637773Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8638170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8638382Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8638761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8639054Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8639261Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8639574Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8639651Z ^
2025-12-04T10:35:19.8640049Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8640054Z 
2025-12-04T10:35:19.8640678Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8640682Z 
2025-12-04T10:35:19.8640686Z 
2025-12-04T10:35:19.8640872Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8641571Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8641580Z 
2025-12-04T10:35:19.8641811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8641998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8642089Z frames [('total', 1)]
2025-12-04T10:35:19.8642187Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8642393Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8642583Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8642774Z graph_break []
2025-12-04T10:35:19.8642963Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8643052Z frames [('total', 1)]
2025-12-04T10:35:19.8643150Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8643341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8643540Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8643633Z graph_break []
2025-12-04T10:35:19.8643815Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8643902Z frames [('total', 1)]
2025-12-04T10:35:19.8644003Z stats [('calls_captured', 7)]
2025-12-04T10:35:19.8644190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8644387Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.8644474Z graph_break []
2025-12-04T10:35:19.8645049Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml -
2025-12-04T10:35:19.8645198Z =========================== short test summary info ============================
2025-12-04T10:35:19.8645897Z FAILED [0.4719s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8646288Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8646366Z ^
2025-12-04T10:35:19.8646762Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8646767Z 
2025-12-04T10:35:19.8647383Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8647388Z 
2025-12-04T10:35:19.8647396Z 
2025-12-04T10:35:19.8647581Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8648277Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8648282Z 
2025-12-04T10:35:19.8648513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8648678Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8648857Z ================== 1 failed, 187 deselected, 2 rerun in 3.07s ==================
2025-12-04T10:35:19.8648944Z Got exit code 1
2025-12-04T10:35:19.8649432Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:19.8649796Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.8650212Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml
2025-12-04T10:35:19.8650357Z ============================= test session starts ==============================
2025-12-04T10:35:19.8650660Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8650754Z cachedir: .pytest_cache
2025-12-04T10:35:19.8651215Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8651323Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8651415Z configfile: pytest.ini
2025-12-04T10:35:19.8651886Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8652080Z collecting ... collected 188 items / 5 deselected / 183 selected
2025-12-04T10:35:19.8652292Z stepcurrent: skipping 5 already run items.
2025-12-04T10:35:19.8652391Z Running 183 items in this shard
2025-12-04T10:35:19.8652396Z 
2025-12-04T10:35:19.8652833Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [1.8549s] [  0%]
2025-12-04T10:35:19.8653269Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2483s] [  1%]
2025-12-04T10:35:19.8653710Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.5542s] [  1%]
2025-12-04T10:35:19.8654144Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2751s] [  2%]
2025-12-04T10:35:19.8654589Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6147s] [  2%]
2025-12-04T10:35:19.8655052Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda ('RERUN', {'yellow': True}) [0.4122s] [  3%]
2025-12-04T10:35:19.8655514Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda ('RERUN', {'yellow': True}) [0.5546s] [  3%]
2025-12-04T10:35:19.8655957Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda FAILED [0.5286s] [  3%]
2025-12-04T10:35:19.8655961Z 
2025-12-04T10:35:19.8656092Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8656421Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8656528Z Traceback (most recent call last):
2025-12-04T10:35:19.8656873Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8657006Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8657427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8657652Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8658096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8658265Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8658711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8658843Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8659358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8659640Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8660093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8660228Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8660641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8660749Z     return self._compile_to_module()
2025-12-04T10:35:19.8661166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8661315Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8661762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8661874Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8662300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8662500Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8663094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8663209Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8663651Z   File "/tmp/tmpk65eknjk/zz/czzej76ui2htys4cgkxwwfhgvy4m3d62u4l5huiwadjiy4qnyo35.py", line 108, in <module>
2025-12-04T10:35:19.8664044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:19.8664147Z     self._wait_futures(scope)
2025-12-04T10:35:19.8664574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:19.8664678Z     kernel = result.result()
2025-12-04T10:35:19.8665057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:19.8665153Z     return self.result_fn()
2025-12-04T10:35:19.8665572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:19.8665683Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:19.8666018Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:19.8666023Z 
2025-12-04T10:35:19.8666166Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8666271Z Traceback (most recent call last):
2025-12-04T10:35:19.8666818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:19.8666903Z     result = job()
2025-12-04T10:35:19.8667415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:19.8667536Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:19.8668018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:19.8668121Z     self._precompile_worker()
2025-12-04T10:35:19.8668635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8668788Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8669303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8669478Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8669865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8670074Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8670453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8670754Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8670915Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8671289Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8671364Z ^
2025-12-04T10:35:19.8671760Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8671773Z 
2025-12-04T10:35:19.8671776Z 
2025-12-04T10:35:19.8672391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8672396Z 
2025-12-04T10:35:19.8672400Z 
2025-12-04T10:35:19.8672586Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8673306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8673312Z 
2025-12-04T10:35:19.8673545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8673732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8673825Z frames [('total', 1)]
2025-12-04T10:35:19.8673924Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8674125Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8674435Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:19.8674520Z graph_break []
2025-12-04T10:35:19.8674774Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8674878Z Traceback (most recent call last):
2025-12-04T10:35:19.8675217Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8675381Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8675825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8676044Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8676487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8676761Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8677203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8677328Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8677790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8678073Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8678522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8678651Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8679063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8679175Z     return self._compile_to_module()
2025-12-04T10:35:19.8679674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8679816Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8680267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8680379Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8680811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8681014Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8681517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8681628Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8682077Z   File "/tmp/tmpo287x3l8/ub/cubfbnb4srqrag7nakprt3xgm2a6lmhbgvdblomju257dl33rb7i.py", line 108, in <module>
2025-12-04T10:35:19.8682466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:19.8682567Z     self._wait_futures(scope)
2025-12-04T10:35:19.8682993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:19.8683092Z     kernel = result.result()
2025-12-04T10:35:19.8683567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:19.8683667Z     return self.result_fn()
2025-12-04T10:35:19.8684086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:19.8684197Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:19.8684528Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:19.8684538Z 
2025-12-04T10:35:19.8684685Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8684791Z Traceback (most recent call last):
2025-12-04T10:35:19.8685261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:19.8685367Z     result = job()
2025-12-04T10:35:19.8685902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:19.8686034Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:19.8686511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:19.8686611Z     self._precompile_worker()
2025-12-04T10:35:19.8687131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8687367Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8687886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8688062Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8688449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8688666Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8689046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8689342Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8689503Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8689873Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8689956Z ^
2025-12-04T10:35:19.8690349Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8690354Z 
2025-12-04T10:35:19.8690358Z 
2025-12-04T10:35:19.8690975Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8690980Z 
2025-12-04T10:35:19.8690983Z 
2025-12-04T10:35:19.8691179Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8691817Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8691821Z 
2025-12-04T10:35:19.8692055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8692248Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8692343Z frames [('total', 1)]
2025-12-04T10:35:19.8692444Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8692637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8692950Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:19.8693034Z graph_break []
2025-12-04T10:35:19.8693301Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8693395Z frames [('total', 1)]
2025-12-04T10:35:19.8693493Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8693690Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8694000Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:19.8694086Z graph_break []
2025-12-04T10:35:19.8694220Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8694472Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8694579Z Traceback (most recent call last):
2025-12-04T10:35:19.8694924Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8695056Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8695482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8695697Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8696143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8696317Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8696756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8696960Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8697425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8697708Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8698163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8698297Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8698709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8698822Z     return self._compile_to_module()
2025-12-04T10:35:19.8699289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8699447Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8699897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8700010Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8700440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8700639Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8701160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8701274Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8701717Z   File "/tmp/tmpnpiyswwt/z7/cz7gnv5vlsc2vpat3huyfnaqm534wiinp2ejsbz6n6lifd3462lp.py", line 108, in <module>
2025-12-04T10:35:19.8702109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:19.8702210Z     self._wait_futures(scope)
2025-12-04T10:35:19.8702640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:19.8702744Z     kernel = result.result()
2025-12-04T10:35:19.8703124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:19.8703228Z     return self.result_fn()
2025-12-04T10:35:19.8703731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:19.8703848Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:19.8704183Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:19.8704188Z 
2025-12-04T10:35:19.8704331Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8704437Z Traceback (most recent call last):
2025-12-04T10:35:19.8704911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:19.8704997Z     result = job()
2025-12-04T10:35:19.8705561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:19.8705682Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:19.8706165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:19.8706267Z     self._precompile_worker()
2025-12-04T10:35:19.8706779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8706940Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8707451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8707705Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8708237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8708448Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8708829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8709130Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8709299Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8709672Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8709747Z ^
2025-12-04T10:35:19.8710144Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8710156Z 
2025-12-04T10:35:19.8710159Z 
2025-12-04T10:35:19.8710775Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8710779Z 
2025-12-04T10:35:19.8710783Z 
2025-12-04T10:35:19.8710971Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8711616Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8711620Z 
2025-12-04T10:35:19.8711850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8712043Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8712133Z frames [('total', 1)]
2025-12-04T10:35:19.8712237Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8712432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8712741Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:19.8712824Z graph_break []
2025-12-04T10:35:19.8713015Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8713107Z frames [('total', 1)]
2025-12-04T10:35:19.8713205Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8713517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8713830Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:19.8713919Z graph_break []
2025-12-04T10:35:19.8714107Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8714195Z frames [('total', 1)]
2025-12-04T10:35:19.8714296Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8714491Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8714800Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:19.8714887Z graph_break []
2025-12-04T10:35:19.8715490Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml -
2025-12-04T10:35:19.8715664Z =========================== short test summary info ============================
2025-12-04T10:35:19.8716429Z FAILED [0.5286s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:19.8716435Z 
2025-12-04T10:35:19.8716579Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8716686Z Traceback (most recent call last):
2025-12-04T10:35:19.8717158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:19.8717378Z     result = job()
2025-12-04T10:35:19.8717894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:19.8718015Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:19.8718503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:19.8718608Z     self._precompile_worker()
2025-12-04T10:35:19.8719122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8719284Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8719796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8719980Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8720366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8720575Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8720961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8721256Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8721417Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8721786Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8721860Z ^
2025-12-04T10:35:19.8722258Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8722267Z 
2025-12-04T10:35:19.8722271Z 
2025-12-04T10:35:19.8722886Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8722891Z 
2025-12-04T10:35:19.8722894Z 
2025-12-04T10:35:19.8723085Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8723809Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8723815Z 
2025-12-04T10:35:19.8724050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8724206Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8724391Z ============== 1 failed, 5 passed, 5 deselected, 2 rerun in 5.09s ==============
2025-12-04T10:35:19.8724480Z Got exit code 1
2025-12-04T10:35:19.8724576Z Retrying single test...
2025-12-04T10:35:19.8724983Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml
2025-12-04T10:35:19.8725128Z ============================= test session starts ==============================
2025-12-04T10:35:19.8725430Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8725528Z cachedir: .pytest_cache
2025-12-04T10:35:19.8725989Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8726097Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8726193Z configfile: pytest.ini
2025-12-04T10:35:19.8726663Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8726857Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.8727514Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8727619Z Running 1 items in this shard
2025-12-04T10:35:19.8727624Z 
2025-12-04T10:35:19.8728617Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8729370Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8729742Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8730123Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.8730570Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.8730977Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8731436Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8731912Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8732413Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8732913Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8733403Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8733779Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.8734226Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8734714Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.8735107Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.8735516Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.8736096Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.8736549Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8737013Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.8737450Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8737952Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8738440Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.8738979Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.8739539Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.8739941Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.8740321Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.8740818Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.8741200Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.8741689Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.8742157Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.8742771Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.8743080Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8744752Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8745221Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8746170Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8746788Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8747555Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8748144Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8748904Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8749570Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8750094Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8750848Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8751320Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8752094Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8752211Z ('RERUN', {'yellow': True}) [1.6612s] [100%]
2025-12-04T10:35:19.8753202Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8753944Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8754320Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8754702Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.8755143Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.8755564Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8756053Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8756519Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8757020Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8757526Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8758013Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8758390Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.8758939Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8759353Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.8759746Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.8760139Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.8760694Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.8761146Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8761619Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.8762052Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8762556Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8763124Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.8763665Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.8764100Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.8764507Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.8764893Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.8765383Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.8765821Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.8766316Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.8766772Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.8767391Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.8767700Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8769359Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8769828Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8770802Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8771343Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8772108Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8772701Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8773463Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8774128Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8774654Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8775483Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8775844Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8776623Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8776741Z ('RERUN', {'yellow': True}) [0.2473s] [100%]
2025-12-04T10:35:19.8777719Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8778472Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8778837Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8779266Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.8779711Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.8780104Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8780564Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8781028Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8781538Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8782039Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8782600Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8782981Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.8783425Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8783831Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.8784226Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.8784609Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.8785159Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.8785641Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8786125Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.8786550Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8787130Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8787619Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.8788154Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.8788597Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.8788999Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.8789385Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.8789879Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.8790255Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.8790750Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.8791211Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.8791822Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.8792128Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8793791Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8794330Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8795226Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8795819Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8796579Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8797170Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8797925Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8798586Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8799187Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8799938Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8800254Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8801021Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8801115Z FAILED [0.2465s] [100%]
2025-12-04T10:35:19.8801120Z 
2025-12-04T10:35:19.8801245Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8801504Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8801612Z Traceback (most recent call last):
2025-12-04T10:35:19.8801956Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8802093Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8802513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8802737Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8803185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8803357Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8803802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8803932Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8804394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8804675Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8805127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8805369Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8805834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8805938Z     return self._compile_to_module()
2025-12-04T10:35:19.8806362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8806502Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8806952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8807069Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8807497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8807704Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8808364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8808474Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8808925Z   File "/tmp/tmp8lonxdkc/uv/cuva5jjko7irujcy6q4rp6idbwoefjob6vzinwzhcbqljuilcl6d.py", line 58, in <module>
2025-12-04T10:35:19.8809329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8809597Z     kernel.precompile(
2025-12-04T10:35:19.8810077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8810178Z     self._precompile_worker()
2025-12-04T10:35:19.8810696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8810852Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8811371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8811547Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8811933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8812150Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8812539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8812831Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8813033Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8813402Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8813482Z ^
2025-12-04T10:35:19.8813883Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8813888Z 
2025-12-04T10:35:19.8814505Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8814510Z 
2025-12-04T10:35:19.8814518Z 
2025-12-04T10:35:19.8814708Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8815355Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8815360Z 
2025-12-04T10:35:19.8815593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8815781Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8815870Z frames [('total', 1)]
2025-12-04T10:35:19.8816083Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8816294Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8816493Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8816580Z graph_break []
2025-12-04T10:35:19.8816832Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8816941Z Traceback (most recent call last):
2025-12-04T10:35:19.8817287Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8817420Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8817844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8818060Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8818510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8818676Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8819160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8819287Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8819747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8820114Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8820563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8820688Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8821106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8821215Z     return self._compile_to_module()
2025-12-04T10:35:19.8821630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8821772Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8825874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8826019Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8826460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8826662Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8827175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8827285Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8827740Z   File "/tmp/tmpp9380s2h/jb/cjbpyxcrwk6uaym3ltnvecck5fx7bzzsku5nxmv2fg3krjinvksr.py", line 58, in <module>
2025-12-04T10:35:19.8828143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8828239Z     kernel.precompile(
2025-12-04T10:35:19.8828726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8828834Z     self._precompile_worker()
2025-12-04T10:35:19.8829352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8829510Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8830029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8830315Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8830706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8830920Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8831304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8831598Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8831801Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8832175Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8832250Z ^
2025-12-04T10:35:19.8832653Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8832659Z 
2025-12-04T10:35:19.8833281Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8833286Z 
2025-12-04T10:35:19.8833290Z 
2025-12-04T10:35:19.8833481Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8834118Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8834207Z 
2025-12-04T10:35:19.8834440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8834634Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8834722Z frames [('total', 1)]
2025-12-04T10:35:19.8834822Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8835031Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8835229Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8835317Z graph_break []
2025-12-04T10:35:19.8835506Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8835605Z frames [('total', 1)]
2025-12-04T10:35:19.8835723Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8835935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8836147Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8836235Z graph_break []
2025-12-04T10:35:19.8836365Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8836619Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8836725Z Traceback (most recent call last):
2025-12-04T10:35:19.8837071Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8837216Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8837638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8837851Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8838298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8838469Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8838916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8839041Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8839502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8839788Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8840326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8840457Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8840878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8840983Z     return self._compile_to_module()
2025-12-04T10:35:19.8841408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8841549Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8841996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8842110Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8842543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8842746Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8843252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8843362Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8843810Z   File "/tmp/tmppxehlg09/fy/cfyfnvajlle66ucqnsevabypmshx5viix7a3tpd4lke4f4vrkqqa.py", line 58, in <module>
2025-12-04T10:35:19.8844324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
﻿2025-12-04T10:35:19.8850120Z     kernel.precompile(
2025-12-04T10:35:19.8850618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8850722Z     self._precompile_worker()
2025-12-04T10:35:19.8851253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8851410Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8851925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8852100Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8852496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8852706Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8853110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8853403Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8853603Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8853978Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8854056Z ^
2025-12-04T10:35:19.8854452Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8854457Z 
2025-12-04T10:35:19.8855076Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8855082Z 
2025-12-04T10:35:19.8855086Z 
2025-12-04T10:35:19.8855274Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8855971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8855978Z 
2025-12-04T10:35:19.8856317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8856507Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8856599Z frames [('total', 1)]
2025-12-04T10:35:19.8856699Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8856907Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8857100Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8857187Z graph_break []
2025-12-04T10:35:19.8857371Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8857469Z frames [('total', 1)]
2025-12-04T10:35:19.8857566Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8857759Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8857962Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8858044Z graph_break []
2025-12-04T10:35:19.8858235Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8858322Z frames [('total', 1)]
2025-12-04T10:35:19.8858422Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8858612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8858812Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8858898Z graph_break []
2025-12-04T10:35:19.8859532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml -
2025-12-04T10:35:19.8859741Z =========================== short test summary info ============================
2025-12-04T10:35:19.8860464Z FAILED [0.2465s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8860835Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8860915Z ^
2025-12-04T10:35:19.8861316Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8861321Z 
2025-12-04T10:35:19.8861936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8861944Z 
2025-12-04T10:35:19.8861948Z 
2025-12-04T10:35:19.8862144Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8862781Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8862789Z 
2025-12-04T10:35:19.8863022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8863183Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8863362Z ================== 1 failed, 187 deselected, 2 rerun in 2.19s ==================
2025-12-04T10:35:19.8863453Z Got exit code 1
2025-12-04T10:35:19.8863545Z Retrying single test...
2025-12-04T10:35:19.8863950Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml
2025-12-04T10:35:19.8864096Z ============================= test session starts ==============================
2025-12-04T10:35:19.8864398Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8864495Z cachedir: .pytest_cache
2025-12-04T10:35:19.8864950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8865061Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8865158Z configfile: pytest.ini
2025-12-04T10:35:19.8865765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8865965Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.8866538Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8866637Z Running 1 items in this shard
2025-12-04T10:35:19.8866641Z 
2025-12-04T10:35:19.8867633Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8868387Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8868765Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8869142Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.8869587Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.8869983Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8870544Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8871231Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8871883Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8872389Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8872874Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8873250Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.8873701Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8874111Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.8874504Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.8874890Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.8875483Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.8875942Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8876407Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.8876840Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8877341Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8877927Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.8878469Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.8878903Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.8879307Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.8879690Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.8880182Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.8880567Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.8881058Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.8881519Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.8882129Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.8882483Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8884199Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8884662Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8885671Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8886220Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8886989Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8887573Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8888328Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8888991Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8889523Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8890384Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8890696Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8891466Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8891585Z ('RERUN', {'yellow': True}) [1.6693s] [100%]
2025-12-04T10:35:19.8892573Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8893323Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8893689Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8894067Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.8894552Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.8894947Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8895488Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8895978Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8896482Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8896981Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8897473Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8897848Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.8898301Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8898717Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.8899162Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.8899546Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.8900096Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.8900552Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8901018Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.8901443Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8902030Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8902522Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.8903059Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.8903496Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.8903900Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.8904283Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.8904775Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.8905159Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.8905650Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.8906158Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.8906770Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.8907128Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8908951Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8909416Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8910316Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8910858Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8911622Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8912207Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8912964Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8913639Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8914307Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8915057Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8915386Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8916189Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8916308Z ('RERUN', {'yellow': True}) [0.2494s] [100%]
2025-12-04T10:35:19.8917291Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8918040Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8918404Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8918844Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:19.8919351Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:19.8919747Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.8920210Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.8920674Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.8921176Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.8921677Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.8922156Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.8922537Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.8922984Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.8923392Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.8923782Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.8924167Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.8924717Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.8925166Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.8925767Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.8926194Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.8926695Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.8927184Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.8927719Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.8928166Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.8928568Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.8928951Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.8929439Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.8929818Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.8930352Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.8930856Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.8931472Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.8931780Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.8933435Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.8933903Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.8934804Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8935343Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8936103Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8936693Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8937457Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8938300Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8938824Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.8939619Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8939930Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.8940698Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8940799Z FAILED [0.2485s] [100%]
2025-12-04T10:35:19.8940804Z 
2025-12-04T10:35:19.8940931Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.8941185Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8941291Z Traceback (most recent call last):
2025-12-04T10:35:19.8941631Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8941818Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8942238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8942506Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8942950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8943123Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8943569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8943700Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8944160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8944448Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8944900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8945034Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8945465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8945577Z     return self._compile_to_module()
2025-12-04T10:35:19.8946022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8946161Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8946616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8946729Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8947160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8947363Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8947871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8947984Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8948536Z   File "/tmp/tmp9tcbijrv/xo/cxo33zxdzb3qc376pcuo6i3b6rmssudfx3eitm4empddy7gvcvqq.py", line 58, in <module>
2025-12-04T10:35:19.8948938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8949037Z     kernel.precompile(
2025-12-04T10:35:19.8949516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8949619Z     self._precompile_worker()
2025-12-04T10:35:19.8950138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8950292Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8950810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8950982Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8951372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8951589Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8951972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8952265Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8952510Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8952879Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8953006Z ^
2025-12-04T10:35:19.8953402Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8953407Z 
2025-12-04T10:35:19.8954030Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8954039Z 
2025-12-04T10:35:19.8954043Z 
2025-12-04T10:35:19.8954232Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8954872Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8954883Z 
2025-12-04T10:35:19.8955118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8955308Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8955421Z frames [('total', 1)]
2025-12-04T10:35:19.8955530Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8955767Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8955961Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8956047Z graph_break []
2025-12-04T10:35:19.8956306Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8956419Z Traceback (most recent call last):
2025-12-04T10:35:19.8956759Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8956896Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8957314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8957531Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8957978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8958148Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8958586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8958795Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8959259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8959542Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8959994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8960128Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8960543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8960649Z     return self._compile_to_module()
2025-12-04T10:35:19.8961068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8961214Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8961658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8961772Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8962200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8962399Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8962953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8963108Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8963553Z   File "/tmp/tmpjshhi764/zm/czmuberi25g4ahoncv7tyrwejxkrnzsnlto4j6mfvbr4wxi2cjlp.py", line 58, in <module>
2025-12-04T10:35:19.8963953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8964054Z     kernel.precompile(
2025-12-04T10:35:19.8964536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8964639Z     self._precompile_worker()
2025-12-04T10:35:19.8965160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8965317Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8965831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8966012Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8966398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8966608Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8966995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8967286Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8967487Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8967855Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8967932Z ^
2025-12-04T10:35:19.8968331Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8968339Z 
2025-12-04T10:35:19.8968953Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8968959Z 
2025-12-04T10:35:19.8968963Z 
2025-12-04T10:35:19.8969231Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8969870Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8969875Z 
2025-12-04T10:35:19.8970108Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8970296Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8970390Z frames [('total', 1)]
2025-12-04T10:35:19.8970500Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8970713Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8970907Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8970994Z graph_break []
2025-12-04T10:35:19.8971190Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8971277Z frames [('total', 1)]
2025-12-04T10:35:19.8971378Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8971574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8971777Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8971865Z graph_break []
2025-12-04T10:35:19.8971992Z =================================== FAILURES ===================================
2025-12-04T10:35:19.8972243Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____
2025-12-04T10:35:19.8972401Z Traceback (most recent call last):
2025-12-04T10:35:19.8972747Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.8972956Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.8973377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.8973594Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.8974045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.8974222Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.8974668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.8974801Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.8975267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.8975566Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.8976051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.8976180Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.8976607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.8976711Z     return self._compile_to_module()
2025-12-04T10:35:19.8977127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.8977274Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.8977720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.8977833Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.8978274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.8978481Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.8979142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.8979257Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.8979700Z   File "/tmp/tmp9sbz1deu/sc/cscuwzk2qxdyvwkgqlg6pvzlidnuaf6v26jmkbp6ofr6gsbbgyhc.py", line 58, in <module>
2025-12-04T10:35:19.8980106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.8980201Z     kernel.precompile(
2025-12-04T10:35:19.8980693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.8980798Z     self._precompile_worker()
2025-12-04T10:35:19.8981312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.8981476Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.8981997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.8982169Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.8982562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.8982774Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.8983241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.8983580Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.8983782Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8984211Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8984286Z ^
2025-12-04T10:35:19.8984688Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8984693Z 
2025-12-04T10:35:19.8985311Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8985316Z 
2025-12-04T10:35:19.8985319Z 
2025-12-04T10:35:19.8985533Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8986208Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8986212Z 
2025-12-04T10:35:19.8986446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8986642Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8986732Z frames [('total', 1)]
2025-12-04T10:35:19.8986837Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8987055Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8987246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8987333Z graph_break []
2025-12-04T10:35:19.8987518Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8987606Z frames [('total', 1)]
2025-12-04T10:35:19.8987709Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8987901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8988105Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8988192Z graph_break []
2025-12-04T10:35:19.8988378Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.8988465Z frames [('total', 1)]
2025-12-04T10:35:19.8988564Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.8988751Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.8989035Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.8989122Z graph_break []
2025-12-04T10:35:19.8989684Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml -
2025-12-04T10:35:19.8989834Z =========================== short test summary info ============================
2025-12-04T10:35:19.8990454Z FAILED [0.2485s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.8990835Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8990917Z ^
2025-12-04T10:35:19.8991317Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.8991321Z 
2025-12-04T10:35:19.8991944Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.8991948Z 
2025-12-04T10:35:19.8991952Z 
2025-12-04T10:35:19.8992139Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.8992778Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8992830Z 
2025-12-04T10:35:19.8993061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.8993219Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.8993444Z ================== 1 failed, 187 deselected, 2 rerun in 2.20s ==================
2025-12-04T10:35:19.8993530Z Got exit code 1
2025-12-04T10:35:19.8993969Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda
2025-12-04T10:35:19.8994332Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.8994742Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml
2025-12-04T10:35:19.8994899Z ============================= test session starts ==============================
2025-12-04T10:35:19.8995202Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.8995296Z cachedir: .pytest_cache
2025-12-04T10:35:19.8995762Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.8995872Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.8995967Z configfile: pytest.ini
2025-12-04T10:35:19.8996436Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.8996638Z collecting ... collected 188 items / 11 deselected / 177 selected
2025-12-04T10:35:19.8996764Z stepcurrent: skipping 11 already run items.
2025-12-04T10:35:19.8996863Z Running 177 items in this shard
2025-12-04T10:35:19.8996867Z 
2025-12-04T10:35:19.8997865Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.8998621Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.8998993Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.8999456Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.8999903Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9000304Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9000771Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9001240Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9001751Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9002258Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9002745Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9003121Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9003571Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9004024Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9004462Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9004853Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9005411Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9005867Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9006334Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9006767Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9007268Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9007900Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9008477Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9008935Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9009355Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9009762Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9010275Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9010672Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9011308Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9011789Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9012439Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9012763Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9014559Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9015046Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9016050Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9016671Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9017503Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9018093Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9018850Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9019561Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9020090Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9020848Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9021159Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9021930Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9022049Z ('RERUN', {'yellow': True}) [1.6732s] [  0%]
2025-12-04T10:35:19.9023028Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9023866Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9024233Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9024615Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9025060Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9025483Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9025979Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9026445Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9026956Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9027455Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9027939Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9028365Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9028810Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9029270Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9029665Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9030047Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9030600Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9031051Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9031521Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9031954Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9032464Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9032953Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9033488Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9033928Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9034327Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9034711Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9035199Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9035654Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9036150Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9036609Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9037225Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9037539Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9039201Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9039662Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9040609Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9041205Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9041977Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9042565Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9043325Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9043991Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9044524Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9045272Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9045583Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9046403Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9046525Z ('RERUN', {'yellow': True}) [0.2497s] [  0%]
2025-12-04T10:35:19.9047509Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9048330Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9048696Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9049083Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9049534Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9049929Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9050396Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9050869Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9051372Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9051877Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9052395Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9052820Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9053269Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9053691Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9054084Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9054463Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9055022Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9055483Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9056000Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9056431Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9056930Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9057424Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9057961Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9058409Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9058816Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9059367Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9059867Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9060244Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9060740Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9061201Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9061814Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9062130Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9063788Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9064293Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9065233Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9065833Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9066596Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9067187Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9067942Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9068615Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9069140Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9069885Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9070202Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9070973Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9071068Z FAILED [0.2485s] [  0%]
2025-12-04T10:35:19.9071073Z 
2025-12-04T10:35:19.9071277Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9071538Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9071652Z Traceback (most recent call last):
2025-12-04T10:35:19.9071994Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9072135Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9072559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9072777Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9073237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9073404Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9073854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9073980Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9074443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9074727Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9075223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9075351Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9075777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9075923Z     return self._compile_to_module()
2025-12-04T10:35:19.9076344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9076492Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9076944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9077061Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9077487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9077695Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9078201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9078318Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9078763Z   File "/tmp/tmp8cc63eui/2g/c2guvru7lxggripjwctrfjt5hfi24ko4xenpsgegaxa6a7shmek5.py", line 58, in <module>
2025-12-04T10:35:19.9079170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9079269Z     kernel.precompile(
2025-12-04T10:35:19.9079755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9079856Z     self._precompile_worker()
2025-12-04T10:35:19.9080372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9080535Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9081051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9081231Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9081619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9081912Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9082301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9082595Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9082806Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9083176Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9083254Z ^
2025-12-04T10:35:19.9083662Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9083669Z 
2025-12-04T10:35:19.9084289Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9084294Z 
2025-12-04T10:35:19.9084297Z 
2025-12-04T10:35:19.9084491Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9085134Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9085139Z 
2025-12-04T10:35:19.9085395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9085665Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9085753Z frames [('total', 1)]
2025-12-04T10:35:19.9085868Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9086081Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9086317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9086413Z graph_break []
2025-12-04T10:35:19.9086666Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9086784Z Traceback (most recent call last):
2025-12-04T10:35:19.9087134Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9087267Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9087696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9087911Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9088363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9088538Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9088984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9089119Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9089588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9089867Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9090327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9090458Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9090882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9090986Z     return self._compile_to_module()
2025-12-04T10:35:19.9091404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9091554Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9092085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9092199Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9092632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9092832Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9093340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9093457Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9093889Z   File "/tmp/tmpz5q_71gu/pf/cpfzckpbvomki4hkwk2wjfi737oqxeouq7ywzvhqfinnsxnnb73i.py", line 58, in <module>
2025-12-04T10:35:19.9094294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9094389Z     kernel.precompile(
2025-12-04T10:35:19.9094874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9094974Z     self._precompile_worker()
2025-12-04T10:35:19.9095515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9095697Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9096209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9096430Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9096816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9097072Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9097462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9097757Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9097957Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9098336Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9098411Z ^
2025-12-04T10:35:19.9098815Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9098825Z 
2025-12-04T10:35:19.9099485Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9099493Z 
2025-12-04T10:35:19.9099497Z 
2025-12-04T10:35:19.9099689Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9100342Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9100347Z 
2025-12-04T10:35:19.9100579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9100770Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9100862Z frames [('total', 1)]
2025-12-04T10:35:19.9100964Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9101179Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9101374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9101471Z graph_break []
2025-12-04T10:35:19.9101655Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9101744Z frames [('total', 1)]
2025-12-04T10:35:19.9101845Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9102034Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9102350Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9102438Z graph_break []
2025-12-04T10:35:19.9102565Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9102818Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9102925Z Traceback (most recent call last):
2025-12-04T10:35:19.9103267Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9103401Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9103820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9104037Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9104497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9104661Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9105106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9105237Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9105723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9106081Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9106527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9106779Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9107197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9107308Z     return self._compile_to_module()
2025-12-04T10:35:19.9107927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9108072Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9108583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9108735Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9113369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9113602Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9114125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9114246Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9114703Z   File "/tmp/tmpymccacl8/rk/crke2xdhbj3meedbntxg6czc4qz6r2p3qojjwhvghfnqs4frkgpw.py", line 58, in <module>
2025-12-04T10:35:19.9115105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9115201Z     kernel.precompile(
2025-12-04T10:35:19.9115688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9115790Z     self._precompile_worker()
2025-12-04T10:35:19.9116308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9116464Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9116977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9117157Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9117704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9117927Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9118309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9118600Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9118805Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9119181Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9119262Z ^
2025-12-04T10:35:19.9119661Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9119667Z 
2025-12-04T10:35:19.9120285Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9120290Z 
2025-12-04T10:35:19.9120293Z 
2025-12-04T10:35:19.9120484Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9121124Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9121186Z 
2025-12-04T10:35:19.9121424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9121613Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9121760Z frames [('total', 1)]
2025-12-04T10:35:19.9121865Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9122073Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9122272Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9122363Z graph_break []
2025-12-04T10:35:19.9122552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9122643Z frames [('total', 1)]
2025-12-04T10:35:19.9122741Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9122930Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9123134Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9123221Z graph_break []
2025-12-04T10:35:19.9123402Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9123492Z frames [('total', 1)]
2025-12-04T10:35:19.9123598Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9123785Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9123989Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9124073Z graph_break []
2025-12-04T10:35:19.9124649Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml -
2025-12-04T10:35:19.9124797Z =========================== short test summary info ============================
2025-12-04T10:35:19.9125430Z FAILED [0.2485s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9125857Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9125933Z ^
2025-12-04T10:35:19.9126333Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9126340Z 
2025-12-04T10:35:19.9126953Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9127045Z 
2025-12-04T10:35:19.9127050Z 
2025-12-04T10:35:19.9127237Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9127882Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9127887Z 
2025-12-04T10:35:19.9128117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9128285Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9128456Z ================== 1 failed, 11 deselected, 2 rerun in 2.21s ===================
2025-12-04T10:35:19.9128548Z Got exit code 1
2025-12-04T10:35:19.9128646Z Retrying single test...
2025-12-04T10:35:19.9129052Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml
2025-12-04T10:35:19.9129205Z ============================= test session starts ==============================
2025-12-04T10:35:19.9129509Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9129603Z cachedir: .pytest_cache
2025-12-04T10:35:19.9130059Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9130169Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9130307Z configfile: pytest.ini
2025-12-04T10:35:19.9130778Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9131016Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.9131598Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9131696Z Running 1 items in this shard
2025-12-04T10:35:19.9131700Z 
2025-12-04T10:35:19.9132690Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9133441Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9133811Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9134198Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9134643Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9135052Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9135511Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9136004Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9136528Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9137028Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9137516Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9137972Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9138419Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9138827Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9139289Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9139673Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9140225Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9140676Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9141145Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9141570Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9142070Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9142615Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9143218Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9143661Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9144058Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9144441Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9144928Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9145308Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9145800Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9146256Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9146871Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9147180Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9148845Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9149394Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9150291Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9150831Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9151596Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9152189Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9152946Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9153608Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9154132Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9154924Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9155272Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9156095Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9156215Z ('RERUN', {'yellow': True}) [1.6996s] [100%]
2025-12-04T10:35:19.9157194Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9157944Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9158315Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9158712Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9159157Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9159549Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9160009Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9160475Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9160979Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9161558Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9162038Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9162417Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9162861Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9163271Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9163663Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9164043Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9164603Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9165050Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9165516Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9166033Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9166533Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9167063Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9167601Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9168040Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9168437Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9168821Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9169308Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9169685Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9170180Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9170638Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9171246Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9171556Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9173298Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9173763Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9174654Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9175202Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9176013Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9176603Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9177356Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9178163Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9178907Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9179805Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9180119Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9180884Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9181002Z ('RERUN', {'yellow': True}) [0.2515s] [100%]
2025-12-04T10:35:19.9181987Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9182732Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9183103Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9183481Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9183929Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9184326Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9184788Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9185253Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9185846Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9186349Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9186825Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9187206Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9187652Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9188063Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9188456Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9188841Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9189394Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9189840Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9190377Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9190843Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9191342Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9191838Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9192471Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9192912Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9193315Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9193697Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9194188Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9194566Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9195059Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9195541Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9196172Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9196483Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9198219Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9198687Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9199582Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9200126Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9200891Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9201478Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9202231Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9202936Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9203504Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9204256Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9204572Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9205340Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9205436Z FAILED [0.2495s] [100%]
2025-12-04T10:35:19.9205443Z 
2025-12-04T10:35:19.9205569Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9205822Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9205933Z Traceback (most recent call last):
2025-12-04T10:35:19.9206278Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9206419Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9206838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9207056Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9207505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9207671Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9208321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9208453Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9209045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9209328Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9209778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9209906Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9210322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9210429Z     return self._compile_to_module()
2025-12-04T10:35:19.9210849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9210992Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9211437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9211557Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9211984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9212188Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9212695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9212861Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9213284Z   File "/tmp/tmpik_5wqao/ob/cob6n3um5rwdndqbfljtoc4j5vyujm37rrko3dab5nwzpjyhkffb.py", line 58, in <module>
2025-12-04T10:35:19.9213683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9213835Z     kernel.precompile(
2025-12-04T10:35:19.9214318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9214428Z     self._precompile_worker()
2025-12-04T10:35:19.9214945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9215098Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9215661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9215838Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9216225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9216439Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9216824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9217118Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9217318Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9217687Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9217762Z ^
2025-12-04T10:35:19.9218161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9218169Z 
2025-12-04T10:35:19.9218783Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9218790Z 
2025-12-04T10:35:19.9218794Z 
2025-12-04T10:35:19.9218983Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9219776Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9219782Z 
2025-12-04T10:35:19.9220018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9220206Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9220295Z frames [('total', 1)]
2025-12-04T10:35:19.9220400Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9220606Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9220805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9220892Z graph_break []
2025-12-04T10:35:19.9221143Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9221251Z Traceback (most recent call last):
2025-12-04T10:35:19.9221593Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9221726Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9222156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9222370Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9222814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9222980Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9223464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9223591Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9224102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9224381Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9224837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9224963Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9225376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9225505Z     return self._compile_to_module()
2025-12-04T10:35:19.9225954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9226098Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9226541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9226655Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9227083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9227287Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9227795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9227906Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9228332Z   File "/tmp/tmprc5k761_/lp/clps3gqidvhsma7uvwji6busb23skm6bjuhl65oruehcaryla2bh.py", line 58, in <module>
2025-12-04T10:35:19.9228738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9228835Z     kernel.precompile(
2025-12-04T10:35:19.9229314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9229417Z     self._precompile_worker()
2025-12-04T10:35:19.9230036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9230196Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9230708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9230878Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9231269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9231483Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9231865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9232156Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9232354Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9232731Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9232806Z ^
2025-12-04T10:35:19.9233201Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9233209Z 
2025-12-04T10:35:19.9233822Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9233869Z 
2025-12-04T10:35:19.9233873Z 
2025-12-04T10:35:19.9234060Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9234747Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9234752Z 
2025-12-04T10:35:19.9234981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9235174Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9235265Z frames [('total', 1)]
2025-12-04T10:35:19.9235364Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9235573Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9235764Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9235851Z graph_break []
2025-12-04T10:35:19.9236038Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9236125Z frames [('total', 1)]
2025-12-04T10:35:19.9236224Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9236415Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9236617Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9236704Z graph_break []
2025-12-04T10:35:19.9236829Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9237084Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9237193Z Traceback (most recent call last):
2025-12-04T10:35:19.9237535Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9237670Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9238089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9238305Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9238750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9238916Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9239355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9239567Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9240028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9240308Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9240754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9240883Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9241303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9241410Z     return self._compile_to_module()
2025-12-04T10:35:19.9241829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9241974Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9242418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9242533Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9242958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9243157Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9243711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9243860Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9244299Z   File "/tmp/tmpx340wjaq/mo/cmogrw5skvbbj2xu4hg6eqish63gmfdg6mr6bnmo42nj6xemg2j5.py", line 58, in <module>
2025-12-04T10:35:19.9244700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9244798Z     kernel.precompile(
2025-12-04T10:35:19.9245280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9245379Z     self._precompile_worker()
2025-12-04T10:35:19.9245946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9246101Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9246614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9246789Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9247174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9247383Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9247771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9248060Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9248259Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9248627Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9248708Z ^
2025-12-04T10:35:19.9249106Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9249114Z 
2025-12-04T10:35:19.9249729Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9249734Z 
2025-12-04T10:35:19.9249738Z 
2025-12-04T10:35:19.9250012Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9250656Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9250661Z 
2025-12-04T10:35:19.9250893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9251078Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9251171Z frames [('total', 1)]
2025-12-04T10:35:19.9251271Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9251478Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9251671Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9251757Z graph_break []
2025-12-04T10:35:19.9251946Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9252036Z frames [('total', 1)]
2025-12-04T10:35:19.9252146Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9252336Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9252547Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9252633Z graph_break []
2025-12-04T10:35:19.9252816Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9252907Z frames [('total', 1)]
2025-12-04T10:35:19.9253050Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9253240Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9253445Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9253570Z graph_break []
2025-12-04T10:35:19.9254138Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml -
2025-12-04T10:35:19.9254288Z =========================== short test summary info ============================
2025-12-04T10:35:19.9254922Z FAILED [0.2495s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9255299Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9255377Z ^
2025-12-04T10:35:19.9255773Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9255785Z 
2025-12-04T10:35:19.9256397Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9256405Z 
2025-12-04T10:35:19.9256408Z 
2025-12-04T10:35:19.9256594Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9257242Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9257246Z 
2025-12-04T10:35:19.9257479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9257638Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9257813Z ================== 1 failed, 187 deselected, 2 rerun in 2.23s ==================
2025-12-04T10:35:19.9257900Z Got exit code 1
2025-12-04T10:35:19.9257997Z Retrying single test...
2025-12-04T10:35:19.9258409Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml
2025-12-04T10:35:19.9258555Z ============================= test session starts ==============================
2025-12-04T10:35:19.9258856Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9258950Z cachedir: .pytest_cache
2025-12-04T10:35:19.9259575Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9259685Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9259780Z configfile: pytest.ini
2025-12-04T10:35:19.9260254Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9260448Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.9261027Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9261134Z Running 1 items in this shard
2025-12-04T10:35:19.9261138Z 
2025-12-04T10:35:19.9262129Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9262882Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9263254Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9263688Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9264135Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9264575Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9265044Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9265510Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9266015Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9266517Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9267003Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9267384Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9267836Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9268244Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9268637Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9269025Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9269583Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9270034Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9270502Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9271036Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9271541Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9272109Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9272653Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9273095Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9273492Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9273881Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9274371Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9274744Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9275288Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9275795Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9276446Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9276761Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9278419Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9278887Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9279791Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9280333Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9281095Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9281684Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9282440Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9283182Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9283708Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9284459Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9284773Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9285565Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9285705Z ('RERUN', {'yellow': True}) [1.6763s] [100%]
2025-12-04T10:35:19.9286689Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9287433Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9287919Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9288341Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9288785Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9289183Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9289646Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9290108Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9290618Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9291116Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9291594Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9291979Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9292422Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9292829Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9293223Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9293605Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9294162Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9294688Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9295159Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9295595Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9296129Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9296622Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9297159Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9297601Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9297998Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9298377Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9298869Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9299338Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9299834Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9300337Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9300956Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9301263Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9302917Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9303394Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9304287Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9304831Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9305620Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9306232Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9307067Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9307869Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9308397Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9309142Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9309462Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9310233Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9310350Z ('RERUN', {'yellow': True}) [0.2497s] [100%]
2025-12-04T10:35:19.9311327Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9312139Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9312585Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9312969Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:19.9313418Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:19.9313811Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9314274Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9314741Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9315238Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9315742Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9316222Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9316601Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9317044Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9317450Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9317844Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9318226Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9318887Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:19.9319339Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9319802Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.broadcast_to(tmp7, [1, 1])
2025-12-04T10:35:19.9320238Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9320739Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9321237Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, float("-inf"))
2025-12-04T10:35:19.9321773Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9322213Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:19.9322609Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tmp6 * tmp8
2025-12-04T10:35:19.9323039Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = -448.0
2025-12-04T10:35:19.9323530Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = triton_helpers.maximum(tmp9, tmp10)
2025-12-04T10:35:19.9323945Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = 448.0
2025-12-04T10:35:19.9324445Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.minimum(tmp11, tmp12)
2025-12-04T10:35:19.9324903Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tmp13.to(tl.float8e4nv)
2025-12-04T10:35:19.9325534Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None)
2025-12-04T10:35:19.9325874Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9327530Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9328003Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9328893Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9329444Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9330207Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9330876Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9331632Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9332291Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9332824Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9333569Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9333887Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9334652Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9334745Z FAILED [0.2486s] [100%]
2025-12-04T10:35:19.9334792Z 
2025-12-04T10:35:19.9334918Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9335171Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9335323Z Traceback (most recent call last):
2025-12-04T10:35:19.9335689Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9335843Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9336278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9336495Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9336941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9337109Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9337553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9337682Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9338146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9338442Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9338896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9339072Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9339497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9339602Z     return self._compile_to_module()
2025-12-04T10:35:19.9340017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9340166Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9340611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9340731Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9341156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9341440Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9341951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9342061Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9342506Z   File "/tmp/tmptaq29jvg/2a/c2aqw2alrfduad2mdb2ncjpize4q2h4xiirhhtewzphqzzoxshhs.py", line 58, in <module>
2025-12-04T10:35:19.9342910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9343009Z     kernel.precompile(
2025-12-04T10:35:19.9343494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9343598Z     self._precompile_worker()
2025-12-04T10:35:19.9344115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9344283Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9344796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9344976Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9345373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9345668Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9346051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9346383Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9346581Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9346954Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9347030Z ^
2025-12-04T10:35:19.9347431Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9347436Z 
2025-12-04T10:35:19.9348054Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9348062Z 
2025-12-04T10:35:19.9348066Z 
2025-12-04T10:35:19.9348256Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9348900Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9348906Z 
2025-12-04T10:35:19.9349142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9349339Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9349433Z frames [('total', 1)]
2025-12-04T10:35:19.9349538Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9349744Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9349937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9350029Z graph_break []
2025-12-04T10:35:19.9350280Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9350390Z Traceback (most recent call last):
2025-12-04T10:35:19.9350736Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9350870Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9351292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9351507Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9352039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9352213Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9352652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9352778Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9353245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9353523Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9353976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9354101Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9354517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9354623Z     return self._compile_to_module()
2025-12-04T10:35:19.9355039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9355187Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9355682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9355865Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9356294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9356535Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9357040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9357160Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9357608Z   File "/tmp/tmp5dp7zlqg/wp/cwpnespmmlwlvhbagstjsstmqnc5p6ceiqwoai7lw3zk44qw3ava.py", line 58, in <module>
2025-12-04T10:35:19.9358015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9358112Z     kernel.precompile(
2025-12-04T10:35:19.9358593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9358698Z     self._precompile_worker()
2025-12-04T10:35:19.9359210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9359369Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9359884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9360056Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9360446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9360663Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9361046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9361341Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9361538Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9361910Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9361985Z ^
2025-12-04T10:35:19.9362459Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9362465Z 
2025-12-04T10:35:19.9363086Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9363091Z 
2025-12-04T10:35:19.9363095Z 
2025-12-04T10:35:19.9363283Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9363933Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9363938Z 
2025-12-04T10:35:19.9364168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9364362Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9364451Z frames [('total', 1)]
2025-12-04T10:35:19.9364552Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9364770Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9364964Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9365048Z graph_break []
2025-12-04T10:35:19.9365237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9365329Z frames [('total', 1)]
2025-12-04T10:35:19.9365427Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9365665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9365868Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9365955Z graph_break []
2025-12-04T10:35:19.9366080Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9366375Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____
2025-12-04T10:35:19.9366484Z Traceback (most recent call last):
2025-12-04T10:35:19.9366831Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9366967Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9367397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9367610Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9368059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9368227Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9368671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9368806Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9369272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9369558Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9370011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9370137Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9370552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9370664Z     return self._compile_to_module()
2025-12-04T10:35:19.9371080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9371230Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9371672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9371794Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9372306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9372509Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9373018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9373133Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9373583Z   File "/tmp/tmprr3o4bwn/fa/cfacopmfzqejrwnkyt657ywnsombld27bdowi7qrlsxiz3ur4tov.py", line 58, in <module>
2025-12-04T10:35:19.9373983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9374081Z     kernel.precompile(
2025-12-04T10:35:19.9374563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9374662Z     self._precompile_worker()
2025-12-04T10:35:19.9375185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9375345Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9375857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9376074Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9376460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9376674Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9377101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9377392Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9377600Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9377975Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9378054Z ^
2025-12-04T10:35:19.9378453Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9378461Z 
2025-12-04T10:35:19.9379143Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9379148Z 
2025-12-04T10:35:19.9379153Z 
2025-12-04T10:35:19.9379347Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9379990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9379999Z 
2025-12-04T10:35:19.9380231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9380420Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9380509Z frames [('total', 1)]
2025-12-04T10:35:19.9380610Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9380818Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9381016Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9381104Z graph_break []
2025-12-04T10:35:19.9381289Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9381379Z frames [('total', 1)]
2025-12-04T10:35:19.9381481Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9381669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9381876Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9382045Z graph_break []
2025-12-04T10:35:19.9382231Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9382324Z frames [('total', 1)]
2025-12-04T10:35:19.9382421Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9382609Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9382812Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9382900Z graph_break []
2025-12-04T10:35:19.9383466Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml -
2025-12-04T10:35:19.9383621Z =========================== short test summary info ============================
2025-12-04T10:35:19.9384248Z FAILED [0.2486s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9384628Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9384703Z ^
2025-12-04T10:35:19.9385100Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9385105Z 
2025-12-04T10:35:19.9385772Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9385822Z 
2025-12-04T10:35:19.9385826Z 
2025-12-04T10:35:19.9386014Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9386695Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9386699Z 
2025-12-04T10:35:19.9386932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9387099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9387273Z ================== 1 failed, 187 deselected, 2 rerun in 2.21s ==================
2025-12-04T10:35:19.9387359Z Got exit code 1
2025-12-04T10:35:19.9387795Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda
2025-12-04T10:35:19.9388156Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.9388565Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml
2025-12-04T10:35:19.9388715Z ============================= test session starts ==============================
2025-12-04T10:35:19.9389016Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9389113Z cachedir: .pytest_cache
2025-12-04T10:35:19.9389570Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9393501Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9393622Z configfile: pytest.ini
2025-12-04T10:35:19.9394098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9394303Z collecting ... collected 188 items / 12 deselected / 176 selected
2025-12-04T10:35:19.9394434Z stepcurrent: skipping 12 already run items.
2025-12-04T10:35:19.9394534Z Running 176 items in this shard
2025-12-04T10:35:19.9394539Z 
2025-12-04T10:35:19.9395590Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9396483Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9396856Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9397229Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9397678Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9398084Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9398547Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9399021Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9399520Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9400023Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9400554Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9400934Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9401433Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9401845Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9402240Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9402625Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9403132Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9403588Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9404057Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9404553Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9405048Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9405634Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9406072Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9406473Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9406857Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9407343Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9408030Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9408523Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9408969Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9409577Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9409877Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9411532Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9411990Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9412934Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9413528Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9414285Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9414872Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9415616Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9416276Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9416791Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9417530Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9417842Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9418597Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9418708Z ('RERUN', {'yellow': True}) [1.9521s] [  0%]
2025-12-04T10:35:19.9419747Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9420586Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9420947Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9421313Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9421746Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9422129Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9422583Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9423045Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9423533Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9424024Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9424531Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9424902Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9425393Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9425824Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9426210Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9426580Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9427076Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9427516Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9427973Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9428462Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9428939Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9429470Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9429897Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9430286Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9430650Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9431206Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9431575Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9432056Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9432505Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9433106Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9433403Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9435053Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9435555Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9436479Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9437048Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9437806Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9438379Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9439130Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9439778Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9440294Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9441029Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9441329Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9442089Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9442197Z ('RERUN', {'yellow': True}) [0.3685s] [  0%]
2025-12-04T10:35:19.9443282Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9444013Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9444370Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9444736Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9445162Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9445553Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9445999Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9446453Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9446943Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9447432Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9447952Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9448357Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9448791Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9449189Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9449570Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9449952Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9450453Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9450897Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9451351Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9451841Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9452323Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9452844Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9453275Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9453661Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9454028Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9454582Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9454946Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9455448Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9455924Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9456524Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9456827Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9458473Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9458972Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9459903Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9460485Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9461234Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9461812Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9462558Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9463206Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9463727Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9464457Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9464759Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9465517Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9465625Z FAILED [0.3667s] [  0%]
2025-12-04T10:35:19.9465630Z 
2025-12-04T10:35:19.9465759Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9466016Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9466275Z Traceback (most recent call last):
2025-12-04T10:35:19.9466607Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9466733Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9467148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9467357Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9467797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9467956Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9468385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9468508Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9468969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9469240Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9469681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9469846Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9470254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9470351Z     return self._compile_to_module()
2025-12-04T10:35:19.9470806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9470940Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9471383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9471490Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9471904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9472096Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9472595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9472702Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9473150Z   File "/tmp/tmpulvykvmu/oa/coauqkvaipwywfcbw5iluza47wxrwaoxbco5tvf7uqjyyv5ziqiz.py", line 113, in <module>
2025-12-04T10:35:19.9473540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9473625Z     kernel.precompile(
2025-12-04T10:35:19.9474098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9474192Z     self._precompile_worker()
2025-12-04T10:35:19.9474697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9474841Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9475345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9475516Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9475995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9476274Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9476875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9477224Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9477553Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9477942Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9478014Z ^
2025-12-04T10:35:19.9478434Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9478439Z 
2025-12-04T10:35:19.9479087Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9479096Z 
2025-12-04T10:35:19.9479100Z 
2025-12-04T10:35:19.9479294Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9479992Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9479996Z 
2025-12-04T10:35:19.9480241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9480440Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9480530Z frames [('total', 1)]
2025-12-04T10:35:19.9480684Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9480880Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9481066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9481216Z graph_break []
2025-12-04T10:35:19.9481460Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9481562Z Traceback (most recent call last):
2025-12-04T10:35:19.9481903Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9482028Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9482439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9482643Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9483074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9483238Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9483666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9483789Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9484240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9484516Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9484958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9485078Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9485512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9485635Z     return self._compile_to_module()
2025-12-04T10:35:19.9486039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9486176Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9486612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9486717Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9487215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9487409Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9487910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9488016Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9488460Z   File "/tmp/tmphwflrjpa/sw/cswj7egzn2q73olgfhdyzu4eylzehnbazgcsdyqiil4cwohbgutv.py", line 113, in <module>
2025-12-04T10:35:19.9488851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9488946Z     kernel.precompile(
2025-12-04T10:35:19.9489414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9489512Z     self._precompile_worker()
2025-12-04T10:35:19.9490116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9490270Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9490774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9490936Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9491366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9491567Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9491978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9492257Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9492451Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9492813Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9492884Z ^
2025-12-04T10:35:19.9493269Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9493283Z 
2025-12-04T10:35:19.9493893Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9493898Z 
2025-12-04T10:35:19.9493902Z 
2025-12-04T10:35:19.9494086Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9494729Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9494734Z 
2025-12-04T10:35:19.9494958Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9495140Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9495223Z frames [('total', 1)]
2025-12-04T10:35:19.9495320Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9495540Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9495753Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9495833Z graph_break []
2025-12-04T10:35:19.9496013Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9496101Z frames [('total', 1)]
2025-12-04T10:35:19.9496194Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9496373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9496563Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9496645Z graph_break []
2025-12-04T10:35:19.9496841Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9497085Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9497189Z Traceback (most recent call last):
2025-12-04T10:35:19.9497520Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9497654Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9498062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9498269Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9498706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9498868Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9499358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9499482Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9499932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9500204Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9500686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9500805Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9501250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9501348Z     return self._compile_to_module()
2025-12-04T10:35:19.9501769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9501902Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9502334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9502442Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9502859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9503051Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9503548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9503653Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9504095Z   File "/tmp/tmpcq6xxgnx/u5/cu5odgkkqj2qt5iku45hojz2nksqecnxb6sqwnwvdt2w4474rj6b.py", line 113, in <module>
2025-12-04T10:35:19.9504487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9504577Z     kernel.precompile(
2025-12-04T10:35:19.9505048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9505143Z     self._precompile_worker()
2025-12-04T10:35:19.9505649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9505802Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9506304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9506472Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9506847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9507125Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9507502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9507934Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9508133Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9508604Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9508714Z ^
2025-12-04T10:35:19.9509193Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9509202Z 
2025-12-04T10:35:19.9509807Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9509818Z 
2025-12-04T10:35:19.9509822Z 
2025-12-04T10:35:19.9510004Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9510646Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9510651Z 
2025-12-04T10:35:19.9510873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9511144Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9511226Z frames [('total', 1)]
2025-12-04T10:35:19.9511322Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9511588Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9511773Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9511854Z graph_break []
2025-12-04T10:35:19.9512030Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9512113Z frames [('total', 1)]
2025-12-04T10:35:19.9512208Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9512391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9512584Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9512662Z graph_break []
2025-12-04T10:35:19.9512837Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9512925Z frames [('total', 1)]
2025-12-04T10:35:19.9513016Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9513197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9513393Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9513469Z graph_break []
2025-12-04T10:35:19.9514032Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml -
2025-12-04T10:35:19.9514174Z =========================== short test summary info ============================
2025-12-04T10:35:19.9514795Z FAILED [0.3667s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9515163Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9515235Z ^
2025-12-04T10:35:19.9515672Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9515680Z 
2025-12-04T10:35:19.9516284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9516289Z 
2025-12-04T10:35:19.9516293Z 
2025-12-04T10:35:19.9516584Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9517229Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9517233Z 
2025-12-04T10:35:19.9517456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9517607Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9517772Z ================== 1 failed, 12 deselected, 2 rerun in 2.72s ===================
2025-12-04T10:35:19.9517852Z Got exit code 1
2025-12-04T10:35:19.9517941Z Retrying single test...
2025-12-04T10:35:19.9518339Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml
2025-12-04T10:35:19.9518471Z ============================= test session starts ==============================
2025-12-04T10:35:19.9518768Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9518854Z cachedir: .pytest_cache
2025-12-04T10:35:19.9519300Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9519399Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9519486Z configfile: pytest.ini
2025-12-04T10:35:19.9519944Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9520200Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.9520775Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9520908Z Running 1 items in this shard
2025-12-04T10:35:19.9520913Z 
2025-12-04T10:35:19.9521905Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9522645Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9523013Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9523377Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9523805Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9524189Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9524642Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9525095Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9525633Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9526133Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9526607Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9526971Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9527482Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9527881Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9528261Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9528638Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9529132Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9529570Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9530034Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9530517Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9530995Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9531575Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9531999Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9532427Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9532797Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9533274Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9533636Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9534115Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9534675Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9535276Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9535613Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9537284Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9537742Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9538717Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9539303Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9540057Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9540635Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9541383Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9542044Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9542575Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9543313Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9543665Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9544421Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9544571Z ('RERUN', {'yellow': True}) [1.9420s] [100%]
2025-12-04T10:35:19.9545565Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9546299Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9546664Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9547025Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9547467Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9547855Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9548306Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9548765Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9549256Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9549754Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9550228Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9550672Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9551115Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9551509Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9551901Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9552272Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9552769Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9553213Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9553670Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9554159Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9554635Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9555208Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9555720Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9556117Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9556493Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9556977Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9557343Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9557834Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9558283Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9558888Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9559192Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9560844Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9561297Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9562300Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9562832Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9563583Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9564164Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9564908Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9565565Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9566083Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9566913Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9567263Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9568062Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9568173Z ('RERUN', {'yellow': True}) [0.3708s] [100%]
2025-12-04T10:35:19.9569153Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9569891Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9570248Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9570617Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9571042Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9571429Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9571883Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9572333Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9572827Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9573316Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9573780Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9574234Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9574670Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9575070Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9575468Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9575873Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9576376Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9576815Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9577279Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9577771Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9578247Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9578817Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9579361Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9579754Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9580134Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9580610Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9580978Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9581460Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9581912Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9582508Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9582822Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9584467Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9584930Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9585946Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9586477Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9587232Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9587813Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9588571Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9589231Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9589751Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9590485Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9590829Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9591635Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9591721Z FAILED [0.3699s] [100%]
2025-12-04T10:35:19.9591730Z 
2025-12-04T10:35:19.9591850Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9592097Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9592199Z Traceback (most recent call last):
2025-12-04T10:35:19.9592535Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9592664Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9593079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9593293Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9593728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9593900Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9594334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9594452Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9594914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9595184Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9595635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9595760Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9596166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9596266Z     return self._compile_to_module()
2025-12-04T10:35:19.9596758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9596901Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9597342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9597450Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9597875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9598069Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9598571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9598675Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9599119Z   File "/tmp/tmpeers9ivh/dg/cdgnyp7jueorxhm6ynrxygmlo2o76gxsd3mrhkxp3dth2arpjh5u.py", line 113, in <module>
2025-12-04T10:35:19.9599518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9599607Z     kernel.precompile(
2025-12-04T10:35:19.9600073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9600173Z     self._precompile_worker()
2025-12-04T10:35:19.9600726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9600882Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9601425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9601587Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9601974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9602175Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9602549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9602830Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9603024Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9603388Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9603456Z ^
2025-12-04T10:35:19.9603846Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9603851Z 
2025-12-04T10:35:19.9604462Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9604467Z 
2025-12-04T10:35:19.9604470Z 
2025-12-04T10:35:19.9604648Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9605298Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9605303Z 
2025-12-04T10:35:19.9605528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9605738Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9605828Z frames [('total', 1)]
2025-12-04T10:35:19.9605941Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9606149Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9606336Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9606414Z graph_break []
2025-12-04T10:35:19.9606770Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9606874Z Traceback (most recent call last):
2025-12-04T10:35:19.9607212Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9607334Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9607900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9608120Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9608557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9608719Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9609155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9609284Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9609736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9610004Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9610438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9610629Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9611035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9611190Z     return self._compile_to_module()
2025-12-04T10:35:19.9611598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9611735Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9612181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9612285Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9612699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9612896Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9613393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9613497Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9613933Z   File "/tmp/tmphuw91yu4/ew/cew2syydjpk5ch5yn4fvdwwhohx4q5otnd33lfa2qlgqzsm3raae.py", line 113, in <module>
2025-12-04T10:35:19.9614330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9614428Z     kernel.precompile(
2025-12-04T10:35:19.9614898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9614991Z     self._precompile_worker()
2025-12-04T10:35:19.9615499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9615673Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9616206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9616369Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9616747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9616953Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9617438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9617729Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9617923Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9618284Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9618362Z ^
2025-12-04T10:35:19.9618752Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9618757Z 
2025-12-04T10:35:19.9619409Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9619414Z 
2025-12-04T10:35:19.9619418Z 
2025-12-04T10:35:19.9619605Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9620246Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9620257Z 
2025-12-04T10:35:19.9620480Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9620664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9620802Z frames [('total', 1)]
2025-12-04T10:35:19.9620897Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9621092Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9621318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9621402Z graph_break []
2025-12-04T10:35:19.9621587Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9621667Z frames [('total', 1)]
2025-12-04T10:35:19.9621765Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9621951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9622141Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9622220Z graph_break []
2025-12-04T10:35:19.9622348Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9622588Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9622689Z Traceback (most recent call last):
2025-12-04T10:35:19.9623028Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9623153Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9623567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9623775Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9624211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9624372Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9624800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9624923Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9625379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9625652Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9626098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9626219Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9626703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9626812Z     return self._compile_to_module()
2025-12-04T10:35:19.9627219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9627355Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9627794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9627905Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9628322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9628519Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9629018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9629123Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9629562Z   File "/tmp/tmpvvfj7qf6/er/cerelih2buxbfk4bhpgjxaygp5h5rr6ur2bvva3wflcr7p7hmm2m.py", line 113, in <module>
2025-12-04T10:35:19.9629956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9630043Z     kernel.precompile(
2025-12-04T10:35:19.9630557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9630653Z     self._precompile_worker()
2025-12-04T10:35:19.9631156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9631352Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9631858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9632022Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9632404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9632606Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9632982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9633268Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9633458Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9633822Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9633889Z ^
2025-12-04T10:35:19.9634287Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9634296Z 
2025-12-04T10:35:19.9634896Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9634901Z 
2025-12-04T10:35:19.9634905Z 
2025-12-04T10:35:19.9635081Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9635775Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9635780Z 
2025-12-04T10:35:19.9636001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9636190Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9636270Z frames [('total', 1)]
2025-12-04T10:35:19.9636362Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9636726Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9636911Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9636992Z graph_break []
2025-12-04T10:35:19.9637169Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9637253Z frames [('total', 1)]
2025-12-04T10:35:19.9637353Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9637537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9637729Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9637812Z graph_break []
2025-12-04T10:35:19.9637991Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9638075Z frames [('total', 1)]
2025-12-04T10:35:19.9638180Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9638362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9638559Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9638637Z graph_break []
2025-12-04T10:35:19.9639192Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml -
2025-12-04T10:35:19.9639336Z =========================== short test summary info ============================
2025-12-04T10:35:19.9639959Z FAILED [0.3699s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9640369Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9640511Z ^
2025-12-04T10:35:19.9640900Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9640905Z 
2025-12-04T10:35:19.9641520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9641525Z 
2025-12-04T10:35:19.9641530Z 
2025-12-04T10:35:19.9641713Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9642357Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9642365Z 
2025-12-04T10:35:19.9642585Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9642738Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9642913Z ================== 1 failed, 187 deselected, 2 rerun in 2.72s ==================
2025-12-04T10:35:19.9642997Z Got exit code 1
2025-12-04T10:35:19.9643087Z Retrying single test...
2025-12-04T10:35:19.9643503Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml
2025-12-04T10:35:19.9643637Z ============================= test session starts ==============================
2025-12-04T10:35:19.9643935Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9644021Z cachedir: .pytest_cache
2025-12-04T10:35:19.9644467Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9644583Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9644675Z configfile: pytest.ini
2025-12-04T10:35:19.9645130Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9645329Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.9646028Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9646134Z Running 1 items in this shard
2025-12-04T10:35:19.9646139Z 
2025-12-04T10:35:19.9647122Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9647864Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9648228Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9648604Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9649046Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9649432Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9649893Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9650390Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9650879Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9651420Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9651890Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9652262Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9652698Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9653095Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9653484Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9653858Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9654357Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9654799Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9655254Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9655747Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9656234Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9656770Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9657196Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9657697Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9658096Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9658605Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9659004Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9659540Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9659998Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9660599Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9660899Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9662555Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9663090Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9663983Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9664520Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9665281Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9665908Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9666662Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9667312Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9667825Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9668573Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9668875Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9669714Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9669821Z ('RERUN', {'yellow': True}) [1.9424s] [100%]
2025-12-04T10:35:19.9670800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9671543Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9671899Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9672269Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9672699Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9673086Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9673539Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9674040Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9674544Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9675076Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9675550Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9675920Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9676352Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9676762Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9677145Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9677526Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9678023Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9678461Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9682591Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9683082Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9683580Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9684105Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9684639Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9685029Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9685397Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9685925Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9686293Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9686777Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9687224Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9687828Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9688131Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9689777Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9690341Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9691227Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9691758Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9692511Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9693093Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9693839Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9694484Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9695000Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9695737Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9696043Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9696884Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9696991Z ('RERUN', {'yellow': True}) [0.3698s] [100%]
2025-12-04T10:35:19.9697979Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:19.9698713Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9699136Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9699500Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5
2025-12-04T10:35:19.9699931Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 8
2025-12-04T10:35:19.9700313Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9700761Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9701297Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9701860Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9702392Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9702891Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9703284Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:19.9703755Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9704180Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:19.9704592Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:19.9704991Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:19.9705551Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:19.9706048Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9706533Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9707057Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9707572Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:19.9708304Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:19.9708855Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:19.9709273Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9709667Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9710179Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9710574Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9711090Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9711573Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9712218Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9712538Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9714323Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9714917Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9715864Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9716431Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9717242Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9717859Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9718665Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9719370Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9719924Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9720714Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9721037Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9721934Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9722016Z FAILED [0.3690s] [100%]
2025-12-04T10:35:19.9722022Z 
2025-12-04T10:35:19.9722138Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9722391Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9722490Z Traceback (most recent call last):
2025-12-04T10:35:19.9722821Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9722951Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9723362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9723576Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9724007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9724168Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9724597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9724758Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9725270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9725605Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9726203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9726352Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9726862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9726982Z     return self._compile_to_module()
2025-12-04T10:35:19.9727491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9727657Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9728116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9728219Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9728632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9728827Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9729325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9729426Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9729847Z   File "/tmp/tmpi94rswo_/qm/cqmul4ihb2q7mtn4idinpdmrnj3ke5mlqu7zft73jza6ojbzmikj.py", line 113, in <module>
2025-12-04T10:35:19.9730233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9730322Z     kernel.precompile(
2025-12-04T10:35:19.9730793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9730885Z     self._precompile_worker()
2025-12-04T10:35:19.9731393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9731538Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9732710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9732879Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9733255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9733459Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9733830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9734111Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9734302Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9734659Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9734729Z ^
2025-12-04T10:35:19.9735154Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9735160Z 
2025-12-04T10:35:19.9735919Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9735926Z 
2025-12-04T10:35:19.9735930Z 
2025-12-04T10:35:19.9736156Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9737008Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9737017Z 
2025-12-04T10:35:19.9737335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9737513Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9737599Z frames [('total', 1)]
2025-12-04T10:35:19.9737689Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9737887Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9738072Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9738147Z graph_break []
2025-12-04T10:35:19.9738389Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9738495Z Traceback (most recent call last):
2025-12-04T10:35:19.9738827Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9738950Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9739402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9739610Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9740044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9740206Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9740638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9740756Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9741205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9741479Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9741915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9742037Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9742442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9742625Z     return self._compile_to_module()
2025-12-04T10:35:19.9743037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9743174Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9743605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9743709Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9744127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9744319Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9744824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9744926Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9745359Z   File "/tmp/tmplyp9z0_d/qz/cqzpqb5g2dn76mrsrliltewzmsmd63hczhwscljqn3opivcxpppp.py", line 113, in <module>
2025-12-04T10:35:19.9745800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9745885Z     kernel.precompile(
2025-12-04T10:35:19.9746356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9746494Z     self._precompile_worker()
2025-12-04T10:35:19.9747000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9747186Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9747687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9747854Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9748239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9748444Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9748813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9749091Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9749283Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9749641Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9749712Z ^
2025-12-04T10:35:19.9750099Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9750104Z 
2025-12-04T10:35:19.9750714Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9750719Z 
2025-12-04T10:35:19.9750723Z 
2025-12-04T10:35:19.9750908Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9751543Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9751550Z 
2025-12-04T10:35:19.9751773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9751948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9752030Z frames [('total', 1)]
2025-12-04T10:35:19.9752123Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9752316Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9752498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9752658Z graph_break []
2025-12-04T10:35:19.9752833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9752914Z frames [('total', 1)]
2025-12-04T10:35:19.9753003Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9753180Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9753370Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9753449Z graph_break []
2025-12-04T10:35:19.9753563Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9753805Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____
2025-12-04T10:35:19.9753903Z Traceback (most recent call last):
2025-12-04T10:35:19.9754235Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9754361Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9754777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9754985Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9755418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9755574Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9756056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9756172Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9756673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9756940Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9757380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9757499Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9757901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9757997Z     return self._compile_to_module()
2025-12-04T10:35:19.9758409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9758539Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9758981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9759096Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9759637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9759913Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9760553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9760691Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9761125Z   File "/tmp/tmpuwd5n7ww/c2/cc2if36af3aygo6zlipqsg5nkk7qdid33txk5tpovfoztw5djvu6.py", line 113, in <module>
2025-12-04T10:35:19.9761522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9761611Z     kernel.precompile(
2025-12-04T10:35:19.9762082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9762172Z     self._precompile_worker()
2025-12-04T10:35:19.9762779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9762931Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9763437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9763601Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9763977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9764184Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9764553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9764837Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9765025Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9765387Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9765457Z ^
2025-12-04T10:35:19.9765892Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9765896Z 
2025-12-04T10:35:19.9766502Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9766552Z 
2025-12-04T10:35:19.9766556Z 
2025-12-04T10:35:19.9766734Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9767407Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9767415Z 
2025-12-04T10:35:19.9767637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9767817Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9767901Z frames [('total', 1)]
2025-12-04T10:35:19.9767992Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9768185Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9768372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9768452Z graph_break []
2025-12-04T10:35:19.9768624Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9768707Z frames [('total', 1)]
2025-12-04T10:35:19.9768800Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9768982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9769171Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9769247Z graph_break []
2025-12-04T10:35:19.9769424Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9769510Z frames [('total', 1)]
2025-12-04T10:35:19.9769599Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9769779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9769967Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:19.9770042Z graph_break []
2025-12-04T10:35:19.9770599Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml -
2025-12-04T10:35:19.9770740Z =========================== short test summary info ============================
2025-12-04T10:35:19.9771363Z FAILED [0.3690s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9771724Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:19.9771792Z ^
2025-12-04T10:35:19.9772289Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9772294Z 
2025-12-04T10:35:19.9772895Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9772900Z 
2025-12-04T10:35:19.9772906Z 
2025-12-04T10:35:19.9773086Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9773830Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9773838Z 
2025-12-04T10:35:19.9774062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9774207Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9774375Z ================== 1 failed, 187 deselected, 2 rerun in 2.72s ==================
2025-12-04T10:35:19.9774457Z Got exit code 1
2025-12-04T10:35:19.9774882Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda
2025-12-04T10:35:19.9775234Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:19.9775627Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml
2025-12-04T10:35:19.9775806Z ============================= test session starts ==============================
2025-12-04T10:35:19.9776099Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9776228Z cachedir: .pytest_cache
2025-12-04T10:35:19.9776672Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9776779Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9776865Z configfile: pytest.ini
2025-12-04T10:35:19.9777323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9777510Z collecting ... collected 188 items / 13 deselected / 175 selected
2025-12-04T10:35:19.9777625Z stepcurrent: skipping 13 already run items.
2025-12-04T10:35:19.9777719Z Running 175 items in this shard
2025-12-04T10:35:19.9777724Z 
2025-12-04T10:35:19.9778710Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9779599Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9779957Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9780330Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.9780717Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9781171Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9781628Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9782119Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9782691Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9783163Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9783536Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.9784070Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.9784560Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.9785006Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.9785451Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9785858Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.9786259Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.9786689Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.9787329Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.9787799Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9788293Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9788773Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.9789237Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.9789717Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.9790155Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9790605Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9791032Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:19.9791418Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9791784Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9792260Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9792625Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9793108Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9793633Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9794232Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9794528Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9796390Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9796840Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9797722Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9798289Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9799079Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9799656Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9800396Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9801045Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9801559Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9802385Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9802688Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9803442Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9803550Z ('RERUN', {'yellow': True}) [1.6982s] [  0%]
2025-12-04T10:35:19.9804524Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9805349Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9805783Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9806158Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.9806541Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9806991Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9807444Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9808118Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9808614Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9809076Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9809447Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.9810053Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.9810538Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.9811144Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.9811589Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9812002Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.9812400Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.9812785Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.9813424Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.9813859Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9814355Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9814831Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.9815291Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.9815772Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.9816207Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9816662Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9817194Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:19.9817582Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9817947Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9818420Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9818789Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9819345Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9819795Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9820390Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9820684Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9822536Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9823071Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9823956Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9824483Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9825234Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9825809Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9826552Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9827197Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9827712Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9828533Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9828913Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9829677Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9829785Z ('RERUN', {'yellow': True}) [0.2539s] [  0%]
2025-12-04T10:35:19.9830760Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9831587Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9831950Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9832327Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.9832708Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9833157Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9833654Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9834180Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9834678Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9835147Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9835529Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.9836056Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.9836543Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.9837002Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.9837448Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9837860Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.9838262Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.9838653Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.9839291Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.9839726Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9840301Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9840780Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.9841247Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.9841724Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.9842163Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9842620Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9843044Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:19.9843434Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9843798Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9844270Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9844682Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9845160Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9845680Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9846304Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9846600Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9848453Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9848918Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9849797Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9850323Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9851085Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9851661Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9852486Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9853136Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9853655Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9854682Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9854999Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9855806Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9855895Z FAILED [0.2518s] [  0%]
2025-12-04T10:35:19.9855900Z 
2025-12-04T10:35:19.9856020Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9856343Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:19.9856445Z Traceback (most recent call last):
2025-12-04T10:35:19.9856781Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9856948Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9857364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9857577Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9858015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9858176Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9858607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9858726Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9859229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9859502Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9859945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9860063Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9860471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9860576Z     return self._compile_to_module()
2025-12-04T10:35:19.9860983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9861117Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9861553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9861657Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9862079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9862271Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9862851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9862954Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9863370Z   File "/tmp/tmpii_aqm5m/dm/cdmq4hscslc3dxrhyn4irizq3gehd4b6o2o37xojoqw45umw3dlc.py", line 58, in <module>
2025-12-04T10:35:19.9863761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9863851Z     kernel.precompile(
2025-12-04T10:35:19.9864318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9864417Z     self._precompile_worker()
2025-12-04T10:35:19.9864924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9865074Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9865615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9865793Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9866176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9866378Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9866796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9867078Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9867307Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9867755Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9867827Z ^
2025-12-04T10:35:19.9868217Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9868222Z 
2025-12-04T10:35:19.9868833Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9868838Z 
2025-12-04T10:35:19.9868842Z 
2025-12-04T10:35:19.9869018Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9869665Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.9869672Z 
2025-12-04T10:35:19.9869892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9870074Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9870155Z frames [('total', 1)]
2025-12-04T10:35:19.9870253Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9870457Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9870640Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9870716Z graph_break []
2025-12-04T10:35:19.9870964Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:19.9871062Z Traceback (most recent call last):
2025-12-04T10:35:19.9871405Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9871534Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9871941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9872152Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9872671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9872829Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9873262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9873381Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9873839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9874107Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9874545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9874672Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9875078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9875183Z     return self._compile_to_module()
2025-12-04T10:35:19.9875635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9875780Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9876215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9876361Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9876776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9877015Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9877512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9877623Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9878068Z   File "/tmp/tmpswodzstu/km/ckmuvog2sm7j37zwknidtfsvo2apzyznlu6sudtjbnnfbedyv6ef.py", line 58, in <module>
2025-12-04T10:35:19.9878457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9878549Z     kernel.precompile(
2025-12-04T10:35:19.9879018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9879119Z     self._precompile_worker()
2025-12-04T10:35:19.9879623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9879775Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9880285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9880451Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9880833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9881038Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9881406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9881692Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9881881Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9882328Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9882406Z ^
2025-12-04T10:35:19.9882791Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9882795Z 
2025-12-04T10:35:19.9883482Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9883487Z 
2025-12-04T10:35:19.9883491Z 
2025-12-04T10:35:19.9883669Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9884308Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.9884316Z 
2025-12-04T10:35:19.9884534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9884713Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9884798Z frames [('total', 1)]
2025-12-04T10:35:19.9884890Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9885085Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9885282Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9885365Z graph_break []
2025-12-04T10:35:19.9885570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9885669Z frames [('total', 1)]
2025-12-04T10:35:19.9885765Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9885949Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9886186Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9886261Z graph_break []
2025-12-04T10:35:19.9886386Z =================================== FAILURES ===================================
2025-12-04T10:35:19.9886674Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:19.9886774Z Traceback (most recent call last):
2025-12-04T10:35:19.9887109Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9887242Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9887655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9887861Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9888292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9888454Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9888881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9889006Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9889452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9889726Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9890177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9890294Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9890694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9890800Z     return self._compile_to_module()
2025-12-04T10:35:19.9891205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9891341Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9891778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9891882Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9892404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9892598Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9893106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9893206Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9893639Z   File "/tmp/tmpeepljok7/2j/c2jg7xzpv7phtsa45bia2pdg4bfryj76begqwidhfobbf3bkzz7x.py", line 58, in <module>
2025-12-04T10:35:19.9894036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9894127Z     kernel.precompile(
2025-12-04T10:35:19.9894593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:19.9894693Z     self._precompile_worker()
2025-12-04T10:35:19.9895204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:19.9895351Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:19.9895854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9896016Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9896440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9896641Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9897052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9897334Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9897535Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9897985Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9898050Z ^
2025-12-04T10:35:19.9898439Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9898448Z 
2025-12-04T10:35:19.9899115Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9899123Z 
2025-12-04T10:35:19.9899127Z 
2025-12-04T10:35:19.9899310Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9899947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.9899951Z 
2025-12-04T10:35:19.9900181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9900366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9900446Z frames [('total', 1)]
2025-12-04T10:35:19.9900540Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9900740Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9900922Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9900999Z graph_break []
2025-12-04T10:35:19.9901176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9901255Z frames [('total', 1)]
2025-12-04T10:35:19.9901350Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9901534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9901726Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9901808Z graph_break []
2025-12-04T10:35:19.9902060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:19.9902141Z frames [('total', 1)]
2025-12-04T10:35:19.9902240Z stats [('calls_captured', 6)]
2025-12-04T10:35:19.9902417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:19.9902608Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:19.9902694Z graph_break []
2025-12-04T10:35:19.9903247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml -
2025-12-04T10:35:19.9903388Z =========================== short test summary info ============================
2025-12-04T10:35:19.9904011Z FAILED [0.2518s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:19.9904459Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9904530Z ^
2025-12-04T10:35:19.9904916Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9904920Z 
2025-12-04T10:35:19.9905572Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:19.9905624Z 
2025-12-04T10:35:19.9905627Z 
2025-12-04T10:35:19.9905805Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:19.9906444Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.9906489Z 
2025-12-04T10:35:19.9906710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:19.9906862Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:19.9907029Z ================== 1 failed, 13 deselected, 2 rerun in 2.24s ===================
2025-12-04T10:35:19.9907104Z Got exit code 1
2025-12-04T10:35:19.9907188Z Retrying single test...
2025-12-04T10:35:19.9907590Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml
2025-12-04T10:35:19.9907722Z ============================= test session starts ==============================
2025-12-04T10:35:19.9908182Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:19.9908270Z cachedir: .pytest_cache
2025-12-04T10:35:19.9908715Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:19.9908821Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:19.9908909Z configfile: pytest.ini
2025-12-04T10:35:19.9909373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:19.9909560Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:19.9910125Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:19.9910224Z Running 1 items in this shard
2025-12-04T10:35:19.9910228Z 
2025-12-04T10:35:19.9911206Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9912167Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9912529Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9912903Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.9913289Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9913739Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9914195Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9914684Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9915179Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9915705Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9916077Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.9916677Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.9917162Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.9917665Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.9918111Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9918519Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.9918920Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.9919309Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.9919951Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.9920384Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9920880Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9921365Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.9921829Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.9922312Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.9922749Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9923199Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9923715Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:19.9924104Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9924473Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9924948Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9925312Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9925849Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9926296Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9926899Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9927195Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9929100Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9929602Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9930491Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9931022Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9931775Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9932351Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9933094Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9933749Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9934267Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9935095Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9935505Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9936259Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9936370Z ('RERUN', {'yellow': True}) [1.6737s] [100%]
2025-12-04T10:35:19.9937343Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9938165Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9938526Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9938902Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.9939326Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9939821Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9940279Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9940811Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9941317Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9941782Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9942159Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.9942690Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.9943179Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.9943629Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.9944075Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9944490Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.9944889Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.9945275Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.9945920Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.9946352Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9946927Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9947407Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.9947870Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.9948356Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.9948793Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9949255Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9949679Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:19.9950067Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9950436Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9950910Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9951323Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9951841Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9952291Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9952887Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9953185Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9955041Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9955535Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9956432Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9956969Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9957721Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9958372Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9959120Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9959772Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9960286Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9961111Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9961414Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9962169Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9962276Z ('RERUN', {'yellow': True}) [0.2523s] [100%]
2025-12-04T10:35:19.9963247Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:19.9964112Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9964509Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:19.9964887Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:19.9965271Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:19.9965773Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:19.9966226Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:19.9966715Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:19.9967203Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:19.9967670Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:19.9968045Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:19.9968569Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:19.9969056Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:19.9969506Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:19.9970032Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:19.9970444Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:19.9970843Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:19.9971231Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:19.9971870Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:19.9972302Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:19.9976450Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:19.9976954Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:19.9977423Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:19.9977902Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:19.9978402Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:19.9978927Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:19.9979438Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:19.9979830Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:19.9980198Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:19.9980675Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:19.9981046Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:19.9981525Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:19.9981980Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:19.9982582Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:19.9982881Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:19.9984747Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:19.9985289Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:19.9986174Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:19.9986703Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:19.9987463Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:19.9988036Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:19.9988784Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:19.9989435Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:19.9990071Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:19.9990894Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:19.9991235Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:19.9991997Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:19.9992081Z FAILED [0.2512s] [100%]
2025-12-04T10:35:19.9992087Z 
2025-12-04T10:35:19.9992206Z ==================================== RERUNS ====================================
2025-12-04T10:35:19.9992452Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:19.9992554Z Traceback (most recent call last):
2025-12-04T10:35:19.9992888Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:19.9993018Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:19.9993430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:19.9993642Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:19.9994076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:19.9994235Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:19.9994665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:19.9994786Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:19.9995243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:19.9995511Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:19.9995956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:19.9996074Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:19.9996558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:19.9996658Z     return self._compile_to_module()
2025-12-04T10:35:19.9997062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:19.9997198Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:19.9997633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:19.9997738Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:19.9998154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:19.9998347Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:19.9998847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:19.9998952Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:19.9999364Z   File "/tmp/tmpeo1z9_ac/ki/ckiqfrau675jrcajjq225jzsvpaepza2evcxwh7p7veeyxvro6bx.py", line 58, in <module>
2025-12-04T10:35:19.9999757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:19.9999843Z     kernel.precompile(
2025-12-04T10:35:20.0000357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0000457Z     self._precompile_worker()
2025-12-04T10:35:20.0001000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0001150Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0001657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0001819Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0002198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0002398Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0002770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0003052Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0003241Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0003691Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0003759Z ^
2025-12-04T10:35:20.0004148Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0004153Z 
2025-12-04T10:35:20.0004758Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0004763Z 
2025-12-04T10:35:20.0004768Z 
2025-12-04T10:35:20.0004944Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0005587Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0005594Z 
2025-12-04T10:35:20.0005815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0005995Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0006079Z frames [('total', 1)]
2025-12-04T10:35:20.0006251Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0006455Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0006637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0006713Z graph_break []
2025-12-04T10:35:20.0006958Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:20.0007053Z Traceback (most recent call last):
2025-12-04T10:35:20.0007387Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0007515Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0008084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0008298Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0008734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0008891Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0009323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0009439Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0009890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0010233Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0010670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0010846Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0011246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0011349Z     return self._compile_to_module()
2025-12-04T10:35:20.0011755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0011885Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0012320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0012426Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0012841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0013031Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0013527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0013629Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0014057Z   File "/tmp/tmprza68jrp/5r/c5rfu6b2g2c7rswcp6uwh5bi6rtxldexbdcwxcfypb53wsc5i2mk.py", line 58, in <module>
2025-12-04T10:35:20.0014448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0014537Z     kernel.precompile(
2025-12-04T10:35:20.0015003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0015101Z     self._precompile_worker()
2025-12-04T10:35:20.0015604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0015752Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0016260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0016423Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0016909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0017116Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0017485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0017767Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0017960Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0018405Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0018477Z ^
2025-12-04T10:35:20.0018860Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0018865Z 
2025-12-04T10:35:20.0019521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0019527Z 
2025-12-04T10:35:20.0019531Z 
2025-12-04T10:35:20.0019708Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0020338Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0020421Z 
2025-12-04T10:35:20.0020642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0020857Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0020942Z frames [('total', 1)]
2025-12-04T10:35:20.0021032Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0021228Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0021422Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0021502Z graph_break []
2025-12-04T10:35:20.0021681Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0021760Z frames [('total', 1)]
2025-12-04T10:35:20.0021850Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0022030Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0022227Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0022301Z graph_break []
2025-12-04T10:35:20.0022419Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0022662Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:20.0022761Z Traceback (most recent call last):
2025-12-04T10:35:20.0023094Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0023221Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0023632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0023835Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0024265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0024433Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0024861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0024983Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0025435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0025732Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0026423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0026599Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0027121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0027260Z     return self._compile_to_module()
2025-12-04T10:35:20.0027815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0028004Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0028495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0028599Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0029021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0029211Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0029708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0029809Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0030243Z   File "/tmp/tmpl5qag5at/2n/c2neobpztvrnquo7jjy4vsoucdfkatytianoxd34445csppfjpoc.py", line 58, in <module>
2025-12-04T10:35:20.0030693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0030820Z     kernel.precompile(
2025-12-04T10:35:20.0031289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0031383Z     self._precompile_worker()
2025-12-04T10:35:20.0031890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0032038Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0032539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0032700Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0033083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0033283Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0033656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0033935Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0034123Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0034577Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0034646Z ^
2025-12-04T10:35:20.0035031Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0035038Z 
2025-12-04T10:35:20.0035638Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0035645Z 
2025-12-04T10:35:20.0035649Z 
2025-12-04T10:35:20.0035826Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0036466Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0036472Z 
2025-12-04T10:35:20.0036770Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0036951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0037035Z frames [('total', 1)]
2025-12-04T10:35:20.0037127Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0037328Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0037510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0037591Z graph_break []
2025-12-04T10:35:20.0037768Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0037846Z frames [('total', 1)]
2025-12-04T10:35:20.0037949Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0038127Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0038319Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0038395Z graph_break []
2025-12-04T10:35:20.0038571Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0038650Z frames [('total', 1)]
2025-12-04T10:35:20.0038742Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0038917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0039107Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0039187Z graph_break []
2025-12-04T10:35:20.0039787Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml -
2025-12-04T10:35:20.0039927Z =========================== short test summary info ============================
2025-12-04T10:35:20.0040592Z FAILED [0.2512s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0041045Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0041113Z ^
2025-12-04T10:35:20.0041499Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0041503Z 
2025-12-04T10:35:20.0042105Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0042113Z 
2025-12-04T10:35:20.0042117Z 
2025-12-04T10:35:20.0042294Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0042928Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0042936Z 
2025-12-04T10:35:20.0043156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0043307Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0043473Z ================== 1 failed, 187 deselected, 2 rerun in 2.21s ==================
2025-12-04T10:35:20.0043548Z Got exit code 1
2025-12-04T10:35:20.0043631Z Retrying single test...
2025-12-04T10:35:20.0044033Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml
2025-12-04T10:35:20.0044166Z ============================= test session starts ==============================
2025-12-04T10:35:20.0044456Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0044542Z cachedir: .pytest_cache
2025-12-04T10:35:20.0044986Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0045087Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0045171Z configfile: pytest.ini
2025-12-04T10:35:20.0045710Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0045899Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.0046460Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0046556Z Running 1 items in this shard
2025-12-04T10:35:20.0046563Z 
2025-12-04T10:35:20.0047543Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:20.0048375Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0048733Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0049108Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.0049493Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0049987Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0050441Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0050970Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0051463Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0051932Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0052305Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.0052839Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.0053324Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.0053768Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.0054216Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0054625Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.0055028Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.0055418Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:20.0056059Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.0056491Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:20.0057084Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0057568Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:20.0058031Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:20.0058513Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:20.0058950Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0059475Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0059905Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:20.0060292Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0060657Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0061174Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0061536Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0062057Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0062512Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0063111Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0063407Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0065271Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0065726Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0066610Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0067139Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0067891Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0068540Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0069283Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0069938Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0070454Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0071280Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0071584Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0072339Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0072448Z ('RERUN', {'yellow': True}) [1.6776s] [100%]
2025-12-04T10:35:20.0073462Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:20.0074490Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0074968Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0075396Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.0075779Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0076228Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0076687Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0077178Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0077674Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0078138Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0078511Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.0079045Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.0079534Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.0079988Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.0080527Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0080940Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.0081344Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.0081732Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:20.0082372Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.0082806Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:20.0083306Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0083785Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:20.0084251Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:20.0084774Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:20.0085210Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0085755Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0086183Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:20.0086568Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0086937Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0087413Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0087782Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0088365Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0088813Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0089423Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0089724Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0091587Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0092126Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0093013Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0093543Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0094298Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0094877Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0095621Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0096275Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0096830Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0097658Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0098008Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0098767Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0098872Z ('RERUN', {'yellow': True}) [0.2524s] [100%]
2025-12-04T10:35:20.0099897Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0
2025-12-04T10:35:20.0100724Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0101086Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0101460Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.0101843Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0102292Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0102746Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0103235Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0103727Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0104301Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0104680Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.0105208Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.0105746Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.0106194Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.0106635Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0107051Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.0107451Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.0107982Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_0 = r0_index
2025-12-04T10:35:20.0108698Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.0109203Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tl_math.abs(tmp0)
2025-12-04T10:35:20.0109704Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0110182Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = triton_helpers.maximum(_tmp3, tmp2)
2025-12-04T10:35:20.0110647Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp3 = tl.where(r0_mask, tmp4, _tmp3)
2025-12-04T10:35:20.0111124Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = triton_helpers.max2(_tmp3, 1)[:, None]
2025-12-04T10:35:20.0111562Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0112018Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0112440Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp3.to(tl.float32)
2025-12-04T10:35:20.0112835Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0113199Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0113672Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0114039Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0114517Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0114968Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0115722Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0116021Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0117876Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0118336Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0119219Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0119748Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0120545Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0121169Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0121924Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0122580Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0123105Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0123937Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0124240Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0125000Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0125080Z FAILED [0.2500s] [100%]
2025-12-04T10:35:20.0125085Z 
2025-12-04T10:35:20.0125208Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.0125450Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:20.0125553Z Traceback (most recent call last):
2025-12-04T10:35:20.0125891Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0126020Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0126443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0126650Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0127167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0127332Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0127764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0127886Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0128341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0128610Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0129069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0129191Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0129594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0129692Z     return self._compile_to_module()
2025-12-04T10:35:20.0130100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0130234Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0130716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0130821Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0131244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0131479Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0131977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0132086Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0132514Z   File "/tmp/tmpsdp16qvd/oq/coqvl7e4avnrb4webtk7gnbgy4jwbaj35i6key6dw7uioiq6dn35.py", line 58, in <module>
2025-12-04T10:35:20.0132911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0132996Z     kernel.precompile(
2025-12-04T10:35:20.0133469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0133567Z     self._precompile_worker()
2025-12-04T10:35:20.0134075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0134228Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0134733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0134899Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0135282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0135483Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0135856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0136143Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0136333Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0136783Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0136852Z ^
2025-12-04T10:35:20.0137323Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0137328Z 
2025-12-04T10:35:20.0137932Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0137937Z 
2025-12-04T10:35:20.0137941Z 
2025-12-04T10:35:20.0138119Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0138764Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0138771Z 
2025-12-04T10:35:20.0138991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0139215Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0139303Z frames [('total', 1)]
2025-12-04T10:35:20.0139398Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0139611Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0139794Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0139870Z graph_break []
2025-12-04T10:35:20.0140116Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:20.0140213Z Traceback (most recent call last):
2025-12-04T10:35:20.0140588Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0140720Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0141128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0141404Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0141835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0141996Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0142431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0142545Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0142996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0143268Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0143705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0143834Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0144235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0144340Z     return self._compile_to_module()
2025-12-04T10:35:20.0144750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0144881Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0145320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0145427Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0145842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0146034Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0146535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0146640Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0147162Z   File "/tmp/tmpc12taikz/av/cavvtprixkkgxxwzzwpwqn4efewmgqhtskoya27wplgesdx3fcwk.py", line 58, in <module>
2025-12-04T10:35:20.0147557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0147649Z     kernel.precompile(
2025-12-04T10:35:20.0148115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0148218Z     self._precompile_worker()
2025-12-04T10:35:20.0148723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0148870Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0149375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0149536Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0149918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0150121Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0150489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0150771Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0151003Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0151452Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0151562Z ^
2025-12-04T10:35:20.0151952Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0151957Z 
2025-12-04T10:35:20.0152570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0152575Z 
2025-12-04T10:35:20.0152578Z 
2025-12-04T10:35:20.0152756Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0153388Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0153402Z 
2025-12-04T10:35:20.0153624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0153805Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0153893Z frames [('total', 1)]
2025-12-04T10:35:20.0153983Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0154183Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0154381Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0154459Z graph_break []
2025-12-04T10:35:20.0154632Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0154713Z frames [('total', 1)]
2025-12-04T10:35:20.0154803Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0154986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0155186Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0155263Z graph_break []
2025-12-04T10:35:20.0155392Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0155637Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____
2025-12-04T10:35:20.0155735Z Traceback (most recent call last):
2025-12-04T10:35:20.0156068Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0156360Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0156777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0156985Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0157423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0157588Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0158020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0158141Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0158596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0158868Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0159314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0159431Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0159832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0159931Z     return self._compile_to_module()
2025-12-04T10:35:20.0160380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0160516Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0160996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0161105Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0161530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0161719Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0162218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0162320Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0162750Z   File "/tmp/tmpv11tio94/jb/cjbvpmlg5e7xcmkzlsydiijqgtcyl4kztxx55xndnk4zpwgtn6x5.py", line 58, in <module>
2025-12-04T10:35:20.0163145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0163230Z     kernel.precompile(
2025-12-04T10:35:20.0163699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0163795Z     self._precompile_worker()
2025-12-04T10:35:20.0164301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0164450Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0164952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0165113Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0165498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0165697Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0166071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0166359Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0166545Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0167078Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0167149Z ^
2025-12-04T10:35:20.0167535Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0167547Z 
2025-12-04T10:35:20.0168149Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0168156Z 
2025-12-04T10:35:20.0168160Z 
2025-12-04T10:35:20.0168335Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0168971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0168976Z 
2025-12-04T10:35:20.0169282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0169465Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0169544Z frames [('total', 1)]
2025-12-04T10:35:20.0169635Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0169836Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0170017Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0170140Z graph_break []
2025-12-04T10:35:20.0170319Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0170396Z frames [('total', 1)]
2025-12-04T10:35:20.0170529Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0170707Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0170902Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0170979Z graph_break []
2025-12-04T10:35:20.0171159Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0171239Z frames [('total', 1)]
2025-12-04T10:35:20.0171331Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0171507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0171696Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.0171777Z graph_break []
2025-12-04T10:35:20.0172334Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml -
2025-12-04T10:35:20.0172474Z =========================== short test summary info ============================
2025-12-04T10:35:20.0173088Z FAILED [0.2500s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0173539Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.0173608Z ^
2025-12-04T10:35:20.0174001Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0174005Z 
2025-12-04T10:35:20.0174613Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0174620Z 
2025-12-04T10:35:20.0174624Z 
2025-12-04T10:35:20.0174801Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0175433Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0175440Z 
2025-12-04T10:35:20.0175663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0175887Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0176057Z ================== 1 failed, 187 deselected, 2 rerun in 2.21s ==================
2025-12-04T10:35:20.0176134Z Got exit code 1
2025-12-04T10:35:20.0176557Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda
2025-12-04T10:35:20.0176911Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.0177307Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml
2025-12-04T10:35:20.0177448Z ============================= test session starts ==============================
2025-12-04T10:35:20.0177738Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0177830Z cachedir: .pytest_cache
2025-12-04T10:35:20.0178282Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0178381Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0178470Z configfile: pytest.ini
2025-12-04T10:35:20.0178932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0179168Z collecting ... collected 188 items / 14 deselected / 174 selected
2025-12-04T10:35:20.0179359Z stepcurrent: skipping 14 already run items.
2025-12-04T10:35:20.0179451Z Running 174 items in this shard
2025-12-04T10:35:20.0179456Z 
2025-12-04T10:35:20.0180452Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0181236Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0181594Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0181967Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0182402Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0182794Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0183245Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0183702Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0184203Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0184695Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0185164Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0185584Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0186024Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0186419Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0186882Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0187258Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0187754Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0188192Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0188647Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0189136Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0189621Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0190146Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0190573Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0191003Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0191368Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0191893Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0192263Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0192748Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0193192Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0193790Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0194093Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0195884Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0196338Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0197222Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0197755Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0198587Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0199170Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0199915Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0200569Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0201088Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0201827Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0202133Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0202887Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0203043Z ('RERUN', {'yellow': True}) [1.9725s] [  0%]
2025-12-04T10:35:20.0204069Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0204805Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0205164Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0205531Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0206026Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0206407Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0206859Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0207317Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0207962Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0208462Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0208933Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0209297Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0209736Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0210251Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0210639Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0211008Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0211502Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0211952Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0212406Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0212904Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0213383Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0213909Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0214393Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0214785Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0215213Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0215744Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0216116Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0216595Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0217041Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0217646Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0217950Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0219734Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0220192Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0221079Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0221723Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0222479Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0223049Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0223796Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0224454Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0224975Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0225715Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0226015Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0226815Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0226959Z ('RERUN', {'yellow': True}) [0.3892s] [  0%]
2025-12-04T10:35:20.0227952Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0228685Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0229041Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0229414Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0229849Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0230236Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0230692Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0231147Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0231637Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0232129Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0232609Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0232978Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0233489Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0233889Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0234268Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0234641Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0235139Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0235578Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0236035Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0236525Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0237007Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0237530Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0238010Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0238400Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0238807Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0239288Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0239656Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0240133Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0240587Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0241188Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0241495Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0243221Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0243677Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0244561Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0245175Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0245928Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0246500Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0247249Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0247906Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0248424Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0249155Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0249509Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0250265Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0250385Z FAILED [0.3863s] [  0%]
2025-12-04T10:35:20.0250389Z 
2025-12-04T10:35:20.0250508Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.0250761Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0250862Z Traceback (most recent call last):
2025-12-04T10:35:20.0251193Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0251320Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0251732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0251942Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0252374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0252535Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0252967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0253091Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0253542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0253811Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0254256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0254378Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0254789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0254891Z     return self._compile_to_module()
2025-12-04T10:35:20.0255298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0255434Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0255952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0256058Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0256482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0256673Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0257178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0257279Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0257717Z   File "/tmp/tmp3r1dizft/dw/cdwt2lywtnk5z527vsp3g7wsnxlvgovzbfo7fyv6ykrzehhwupqk.py", line 118, in <module>
2025-12-04T10:35:20.0261965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0262066Z     kernel.precompile(
2025-12-04T10:35:20.0262560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0262659Z     self._precompile_worker()
2025-12-04T10:35:20.0263166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0263317Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0263891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0264060Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0264481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0264684Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0265066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0265348Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0265548Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0265918Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0265988Z ^
2025-12-04T10:35:20.0266375Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0266381Z 
2025-12-04T10:35:20.0266986Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0266992Z 
2025-12-04T10:35:20.0266995Z 
2025-12-04T10:35:20.0267174Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0267829Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0267836Z 
2025-12-04T10:35:20.0268057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0268241Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0268327Z frames [('total', 1)]
2025-12-04T10:35:20.0268418Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0268616Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0268800Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0268884Z graph_break []
2025-12-04T10:35:20.0269127Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0269233Z Traceback (most recent call last):
2025-12-04T10:35:20.0269678Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0269807Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0270216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0270424Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0270859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0271024Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0271451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0271572Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0272028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0272297Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0272740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0272858Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0273260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0273405Z     return self._compile_to_module()
2025-12-04T10:35:20.0273810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0273986Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0274426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0274536Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0274954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0275144Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0275665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0275796Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0276215Z   File "/tmp/tmp11_fddqf/ga/cga3hgv4qpzsymnco5mighyf4awpn5cxvjoxb5wf3wn7cpoaxeb3.py", line 118, in <module>
2025-12-04T10:35:20.0276613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0276707Z     kernel.precompile(
2025-12-04T10:35:20.0277175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0277274Z     self._precompile_worker()
2025-12-04T10:35:20.0277776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0277920Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0278424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0278591Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0278970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0279173Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0279541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0279905Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0280097Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0280459Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0280529Z ^
2025-12-04T10:35:20.0280912Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0280919Z 
2025-12-04T10:35:20.0281526Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0281533Z 
2025-12-04T10:35:20.0281537Z 
2025-12-04T10:35:20.0281714Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0282368Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0282373Z 
2025-12-04T10:35:20.0282594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0282770Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0282859Z frames [('total', 1)]
2025-12-04T10:35:20.0282953Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0283149Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0283376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0283454Z graph_break []
2025-12-04T10:35:20.0283630Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0283778Z frames [('total', 1)]
2025-12-04T10:35:20.0283870Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0284052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0284248Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0284328Z graph_break []
2025-12-04T10:35:20.0284446Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0284688Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0284787Z Traceback (most recent call last):
2025-12-04T10:35:20.0285119Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0285246Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0285661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0285868Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0286301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0286464Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0286892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0287010Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0287457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0287727Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0288165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0288286Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0288689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0288786Z     return self._compile_to_module()
2025-12-04T10:35:20.0289280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0289420Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0289852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0289959Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0290380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0290571Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0291071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0291173Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0291611Z   File "/tmp/tmpuq2q77kl/nb/cnbfqdj55nl5pli74rjeyhi3zqxsld57w6qxruczhqjett2weamt.py", line 118, in <module>
2025-12-04T10:35:20.0292004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0292094Z     kernel.precompile(
2025-12-04T10:35:20.0292564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0292655Z     self._precompile_worker()
2025-12-04T10:35:20.0293200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0293347Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0293890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0294055Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0294441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0294640Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0295010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0295291Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0295506Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0295891Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0295962Z ^
2025-12-04T10:35:20.0296348Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0296352Z 
2025-12-04T10:35:20.0296960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0296964Z 
2025-12-04T10:35:20.0296968Z 
2025-12-04T10:35:20.0297149Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0297791Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0297798Z 
2025-12-04T10:35:20.0298019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0298197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0298282Z frames [('total', 1)]
2025-12-04T10:35:20.0298376Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0298573Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0298755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0298834Z graph_break []
2025-12-04T10:35:20.0299142Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0299229Z frames [('total', 1)]
2025-12-04T10:35:20.0299327Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0299506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0299697Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0299782Z graph_break []
2025-12-04T10:35:20.0299957Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0300041Z frames [('total', 1)]
2025-12-04T10:35:20.0300132Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0300313Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0300504Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0300585Z graph_break []
2025-12-04T10:35:20.0301141Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml -
2025-12-04T10:35:20.0301283Z =========================== short test summary info ============================
2025-12-04T10:35:20.0301912Z FAILED [0.3863s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0302278Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0302390Z ^
2025-12-04T10:35:20.0302775Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0302844Z 
2025-12-04T10:35:20.0303444Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0303449Z 
2025-12-04T10:35:20.0303452Z 
2025-12-04T10:35:20.0303636Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0304279Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0304284Z 
2025-12-04T10:35:20.0304506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0304654Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0304822Z ================== 1 failed, 14 deselected, 2 rerun in 2.78s ===================
2025-12-04T10:35:20.0304900Z Got exit code 1
2025-12-04T10:35:20.0304993Z Retrying single test...
2025-12-04T10:35:20.0305392Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml
2025-12-04T10:35:20.0305521Z ============================= test session starts ==============================
2025-12-04T10:35:20.0305819Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0305905Z cachedir: .pytest_cache
2025-12-04T10:35:20.0306346Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0306451Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0306538Z configfile: pytest.ini
2025-12-04T10:35:20.0307002Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0307183Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.0307926Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0308021Z Running 1 items in this shard
2025-12-04T10:35:20.0308026Z 
2025-12-04T10:35:20.0309151Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0309897Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0310255Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0310625Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0311064Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0311452Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0311900Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0312351Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0312900Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0313388Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0313910Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0314287Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0314721Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0315119Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0315529Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0315926Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0316425Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0316862Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0317320Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0317804Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0318283Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0318812Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0319241Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0319630Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0320073Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0320549Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0320915Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0321395Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0321844Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0322443Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0322750Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0324480Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0325011Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0326012Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0326579Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0327386Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0328007Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0328811Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0329511Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0330066Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0330854Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0331178Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0331997Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0332269Z ('RERUN', {'yellow': True}) [2.0059s] [100%]
2025-12-04T10:35:20.0333262Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0333990Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0334349Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0334722Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0335162Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0335547Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0335997Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0336451Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0336981Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0337510Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0337977Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0338347Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0338781Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0339216Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0339601Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0339971Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0340466Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0340908Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0341359Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0341841Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0342324Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0342845Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0343273Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0343744Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0344113Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0344588Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0344949Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0345443Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0346067Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0346848Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0347195Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0348927Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0349513Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0350402Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0350933Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0351690Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0352264Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0353014Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0353666Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0354177Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0354910Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0355214Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0356046Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0356156Z ('RERUN', {'yellow': True}) [0.4051s] [100%]
2025-12-04T10:35:20.0357150Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0357887Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0358242Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0358610Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0359051Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0359433Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0359883Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0360380Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0360867Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0361506Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0361977Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0362345Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0362781Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0363178Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0363560Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0363932Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0364429Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0364870Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0365324Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0365812Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0366293Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0366818Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0367357Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0367754Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0368118Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0368590Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0368957Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0369434Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0369885Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0370486Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0370782Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0372505Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0373038Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0373921Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0374447Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0375204Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0375826Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0376579Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0377230Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0377741Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0378478Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0378779Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0379667Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0379751Z FAILED [0.3898s] [100%]
2025-12-04T10:35:20.0379756Z 
2025-12-04T10:35:20.0379873Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.0380119Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0380222Z Traceback (most recent call last):
2025-12-04T10:35:20.0380555Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0380679Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0381092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0381305Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0381742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0381903Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0382331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0382446Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0382944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0383213Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0383694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0383811Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0384218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0384315Z     return self._compile_to_module()
2025-12-04T10:35:20.0384719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0384851Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0385320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0385466Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0385963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0386162Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0386657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0386766Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0387208Z   File "/tmp/tmpjero85qf/vu/cvupuyoingroaq2iflrglemabqdojijubjjtd5qqp7g3j3o27tbc.py", line 118, in <module>
2025-12-04T10:35:20.0387598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0387685Z     kernel.precompile(
2025-12-04T10:35:20.0388154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0388255Z     self._precompile_worker()
2025-12-04T10:35:20.0388758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0388905Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0389407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0389690Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0390100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0390316Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0390712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0391016Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0391218Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0391608Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0391679Z ^
2025-12-04T10:35:20.0392090Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0392104Z 
2025-12-04T10:35:20.0392756Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0392761Z 
2025-12-04T10:35:20.0392765Z 
2025-12-04T10:35:20.0392953Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0393647Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0393694Z 
2025-12-04T10:35:20.0393915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0394132Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0394215Z frames [('total', 1)]
2025-12-04T10:35:20.0394305Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0394507Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0394688Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0394766Z graph_break []
2025-12-04T10:35:20.0395012Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0395108Z Traceback (most recent call last):
2025-12-04T10:35:20.0395437Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0395566Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0395973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0396182Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0396611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0396772Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0397206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0397326Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0397777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0398055Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0398491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0398616Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0399017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0399113Z     return self._compile_to_module()
2025-12-04T10:35:20.0399612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0399751Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0400194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0400299Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0400718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0400923Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0401419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0401528Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0401971Z   File "/tmp/tmpnuib74iv/q5/cq53e3gj6wlonaq2mc2btbrb5nbvvvjfyf5jotxgxilvfkujxjrv.py", line 118, in <module>
2025-12-04T10:35:20.0402358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0402447Z     kernel.precompile(
2025-12-04T10:35:20.0402918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0403013Z     self._precompile_worker()
2025-12-04T10:35:20.0403562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0403708Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0404251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0404415Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0404793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0404998Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0405367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0405697Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0405889Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0406246Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0406320Z ^
2025-12-04T10:35:20.0406705Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0406710Z 
2025-12-04T10:35:20.0407319Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0407326Z 
2025-12-04T10:35:20.0407330Z 
2025-12-04T10:35:20.0407508Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0408345Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0408354Z 
2025-12-04T10:35:20.0408580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0408755Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0408849Z frames [('total', 1)]
2025-12-04T10:35:20.0408940Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0409137Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0409322Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0409398Z graph_break []
2025-12-04T10:35:20.0409704Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0409790Z frames [('total', 1)]
2025-12-04T10:35:20.0409879Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0410058Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0410252Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0410334Z graph_break []
2025-12-04T10:35:20.0410452Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0410697Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0410799Z Traceback (most recent call last):
2025-12-04T10:35:20.0411139Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0411263Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0411684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0411893Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0412324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0412484Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0412974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0413091Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0413547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0413870Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0414314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0414434Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0414834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0414931Z     return self._compile_to_module()
2025-12-04T10:35:20.0415346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0415500Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0415963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0416072Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0416487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0416686Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0417180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0417286Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0417702Z   File "/tmp/tmppsffx5_k/7v/c7vzaa7nwy65vzavgac7zhgdl3nrjmja5a2yko4dvmr4egroo5ye.py", line 118, in <module>
2025-12-04T10:35:20.0418094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0418181Z     kernel.precompile(
2025-12-04T10:35:20.0418646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0418747Z     self._precompile_worker()
2025-12-04T10:35:20.0419323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0419552Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0420060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0420220Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0420598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0420807Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0421179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0421464Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0421653Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0422020Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0422089Z ^
2025-12-04T10:35:20.0422473Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0422477Z 
2025-12-04T10:35:20.0423085Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0423133Z 
2025-12-04T10:35:20.0423137Z 
2025-12-04T10:35:20.0423314Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0423963Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0424008Z 
2025-12-04T10:35:20.0424230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0424414Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0424494Z frames [('total', 1)]
2025-12-04T10:35:20.0424586Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0424783Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0424964Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0425042Z graph_break []
2025-12-04T10:35:20.0425220Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0425304Z frames [('total', 1)]
2025-12-04T10:35:20.0425393Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0425574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0425765Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0425853Z graph_break []
2025-12-04T10:35:20.0426025Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0426102Z frames [('total', 1)]
2025-12-04T10:35:20.0426200Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0426378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0426573Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0426657Z graph_break []
2025-12-04T10:35:20.0427218Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml -
2025-12-04T10:35:20.0427360Z =========================== short test summary info ============================
2025-12-04T10:35:20.0428000Z FAILED [0.3898s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0428361Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0428434Z ^
2025-12-04T10:35:20.0428925Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0428931Z 
2025-12-04T10:35:20.0429540Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0429545Z 
2025-12-04T10:35:20.0429549Z 
2025-12-04T10:35:20.0429728Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0430374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0430381Z 
2025-12-04T10:35:20.0430604Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0430750Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0430925Z ================== 1 failed, 187 deselected, 2 rerun in 2.84s ==================
2025-12-04T10:35:20.0431002Z Got exit code 1
2025-12-04T10:35:20.0431087Z Retrying single test...
2025-12-04T10:35:20.0431492Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml
2025-12-04T10:35:20.0431623Z ============================= test session starts ==============================
2025-12-04T10:35:20.0431915Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0432050Z cachedir: .pytest_cache
2025-12-04T10:35:20.0432495Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0432637Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0432723Z configfile: pytest.ini
2025-12-04T10:35:20.0433180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0433374Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.0433946Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0434045Z Running 1 items in this shard
2025-12-04T10:35:20.0434049Z 
2025-12-04T10:35:20.0435047Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0435838Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0436200Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0436576Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0437017Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0437398Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0437846Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0438300Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0438794Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0439369Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0439842Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0440213Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0440650Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0441043Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0441432Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0441809Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0442307Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0442742Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0443197Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0443738Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0444255Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0444794Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0445218Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0445622Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0446024Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0446500Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0446872Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0447350Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0447891Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0448490Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0448787Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0450615Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0451071Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0451957Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0452490Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0453244Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0453819Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0454566Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0455220Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0455798Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0456607Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0456911Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0457673Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0457783Z ('RERUN', {'yellow': True}) [1.9923s] [100%]
2025-12-04T10:35:20.0458775Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0459558Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0459923Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0460302Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0460738Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0461127Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0461579Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0462041Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0462612Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0463103Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0463576Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0463949Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0464384Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0464783Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0465163Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0465542Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0466037Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0466472Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0466972Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0467458Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0468007Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0468535Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0468958Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0469350Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0469719Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0470196Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0470561Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0471042Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0471490Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0472085Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0472385Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0474193Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0474656Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0475538Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0476079Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0476835Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0477409Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0478157Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0478850Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0479405Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0480142Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0480446Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0481205Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0481313Z ('RERUN', {'yellow': True}) [0.3890s] [100%]
2025-12-04T10:35:20.0482302Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1
2025-12-04T10:35:20.0483037Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0483394Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.0483761Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 320
2025-12-04T10:35:20.0484199Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 512
2025-12-04T10:35:20.0484582Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.0485029Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.0485489Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.0486103Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.0486595Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.0487059Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.0487427Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.0487862Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.0488257Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.0488642Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.0489010Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.0489506Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0)
2025-12-04T10:35:20.0489952Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.0490448Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.broadcast_to(tmp6, [1, 1])
2025-12-04T10:35:20.0490978Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.0491459Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tl.where(r0_mask, tmp1, float("-inf"))
2025-12-04T10:35:20.0491985Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.0492413Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tmp4.to(tl.float32)
2025-12-04T10:35:20.0492802Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp5 * tmp7
2025-12-04T10:35:20.0493172Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = -448.0
2025-12-04T10:35:20.0493648Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = triton_helpers.maximum(tmp8, tmp9)
2025-12-04T10:35:20.0494018Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = 448.0
2025-12-04T10:35:20.0494499Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = triton_helpers.minimum(tmp10, tmp11)
2025-12-04T10:35:20.0494945Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12.to(tl.float8e4nv)
2025-12-04T10:35:20.0495543Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None)
2025-12-04T10:35:20.0495891Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.0497707Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.0498160Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.0499093Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0499627Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0500388Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0500959Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0501700Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0502396Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0503033Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0503776Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0504076Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.0504834Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0504918Z FAILED [0.3873s] [100%]
2025-12-04T10:35:20.0504923Z 
2025-12-04T10:35:20.0505038Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.0505297Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0505397Z Traceback (most recent call last):
2025-12-04T10:35:20.0505779Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0505916Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0506325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0506536Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0506968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0507127Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0507560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0507678Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0508262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0508653Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0509097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0509223Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0509631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0509733Z     return self._compile_to_module()
2025-12-04T10:35:20.0510142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0510277Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0510717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0510824Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0511255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0511449Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0511945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0512052Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0512575Z   File "/tmp/tmpb8tnvy9q/hs/chscfevuhngazyj2gf4j23d7xcdsorzgbozgy5i4eweytdfv4bta.py", line 118, in <module>
2025-12-04T10:35:20.0512963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0513109Z     kernel.precompile(
2025-12-04T10:35:20.0513578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0513669Z     self._precompile_worker()
2025-12-04T10:35:20.0514183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0514329Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0514837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0515000Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0515381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0515612Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0516013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0516296Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0516491Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0516850Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0516921Z ^
2025-12-04T10:35:20.0517310Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0517315Z 
2025-12-04T10:35:20.0517922Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0517927Z 
2025-12-04T10:35:20.0517931Z 
2025-12-04T10:35:20.0518111Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0518757Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0518763Z 
2025-12-04T10:35:20.0519074Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0519257Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0519341Z frames [('total', 1)]
2025-12-04T10:35:20.0519433Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0519628Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0519819Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0519895Z graph_break []
2025-12-04T10:35:20.0520139Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0520253Z Traceback (most recent call last):
2025-12-04T10:35:20.0520585Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0520711Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0521123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0521330Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0521763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0521918Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0522395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0522518Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0522965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0523278Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0523721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0523839Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0524242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0524339Z     return self._compile_to_module()
2025-12-04T10:35:20.0524751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0524886Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0525318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0525433Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0525847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0526042Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0526544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0526646Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0527099Z   File "/tmp/tmpd4ubmg7c/sy/csyatmqpxjyqxwfhahrl4vh7cdfbspreduldqf7qhfakrpfl4hes.py", line 118, in <module>
2025-12-04T10:35:20.0527600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0527730Z     kernel.precompile(
2025-12-04T10:35:20.0528680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0529636Z     self._precompile_worker()
2025-12-04T10:35:20.0538635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0539885Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0540989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0542156Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0543114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0544172Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0545206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0546339Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0547220Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0548214Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0549077Z ^
2025-12-04T10:35:20.0549713Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0550314Z 
2025-12-04T10:35:20.0550922Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0551740Z 
2025-12-04T10:35:20.0551744Z 
2025-12-04T10:35:20.0551930Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0552867Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0553677Z 
2025-12-04T10:35:20.0553899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0554424Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0554804Z frames [('total', 1)]
2025-12-04T10:35:20.0555033Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0555413Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0555940Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0556310Z graph_break []
2025-12-04T10:35:20.0556603Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0556979Z frames [('total', 1)]
2025-12-04T10:35:20.0557204Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0557551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0558037Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0558417Z graph_break []
2025-12-04T10:35:20.0558651Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0559137Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___
2025-12-04T10:35:20.0559592Z Traceback (most recent call last):
2025-12-04T10:35:20.0560114Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant
2025-12-04T10:35:20.0560682Z     y_compiled = compiled_amax_fp8_quant(x, scale)
2025-12-04T10:35:20.0561328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0562055Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0562806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0563513Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0564207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0564952Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0565679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0566519Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0567343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0568017Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0568648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0569262Z     return self._compile_to_module()
2025-12-04T10:35:20.0569856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0570510Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0571190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0571840Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0572459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0573181Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0574027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0574737Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0575420Z   File "/tmp/tmppv1w55k7/z5/cz5bjnqeovnr7mbxzhf5hcl64pmdawkpjodswz5gb5cju2bmqezn.py", line 118, in <module>
2025-12-04T10:35:20.0576422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.0577018Z     kernel.precompile(
2025-12-04T10:35:20.0577637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.0578312Z     self._precompile_worker()
2025-12-04T10:35:20.0578985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0579826Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0580583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0581364Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0582021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0582718Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0583411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0584178Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0584763Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0585418Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0585961Z ^
2025-12-04T10:35:20.0586440Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0586941Z 
2025-12-04T10:35:20.0587546Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0588263Z 
2025-12-04T10:35:20.0588267Z 
2025-12-04T10:35:20.0588449Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0589495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0590256Z 
2025-12-04T10:35:20.0590478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0590989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0591367Z frames [('total', 1)]
2025-12-04T10:35:20.0591595Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0591957Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0592450Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0592825Z graph_break []
2025-12-04T10:35:20.0593122Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0593499Z frames [('total', 1)]
2025-12-04T10:35:20.0593725Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0594081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0594565Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0594943Z graph_break []
2025-12-04T10:35:20.0595233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0595605Z frames [('total', 1)]
2025-12-04T10:35:20.0595831Z stats [('calls_captured', 6)]
2025-12-04T10:35:20.0596223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0596707Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0597091Z graph_break []
2025-12-04T10:35:20.0597803Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml -
2025-12-04T10:35:20.0598606Z =========================== short test summary info ============================
2025-12-04T10:35:20.0599505Z FAILED [0.3873s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.0600600Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0601139Z ^
2025-12-04T10:35:20.0601616Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0602125Z 
2025-12-04T10:35:20.0602727Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0603444Z 
2025-12-04T10:35:20.0603448Z 
2025-12-04T10:35:20.0603630Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0604562Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0605318Z 
2025-12-04T10:35:20.0605537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0606022Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0606451Z ================== 1 failed, 187 deselected, 2 rerun in 2.80s ==================
2025-12-04T10:35:20.0606804Z Got exit code 1
2025-12-04T10:35:20.0607371Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.0608456Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.0609320Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml
2025-12-04T10:35:20.0609966Z ============================= test session starts ==============================
2025-12-04T10:35:20.0610644Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0611142Z cachedir: .pytest_cache
2025-12-04T10:35:20.0611734Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0612387Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0612667Z configfile: pytest.ini
2025-12-04T10:35:20.0613272Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0614025Z collecting ... collected 188 items / 15 deselected / 173 selected
2025-12-04T10:35:20.0614442Z stepcurrent: skipping 15 already run items.
2025-12-04T10:35:20.0614747Z Running 173 items in this shard
2025-12-04T10:35:20.0614921Z 
2025-12-04T10:35:20.0615300Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [1.8094s] [  0%]
2025-12-04T10:35:20.0616172Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2173s] [  1%]
2025-12-04T10:35:20.0617046Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.4767s] [  1%]
2025-12-04T10:35:20.0617921Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2386s] [  2%]
2025-12-04T10:35:20.0618802Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.4153s] [  2%]
2025-12-04T10:35:20.0619927Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  3%]
2025-12-04T10:35:20.0621053Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.7698s] [  4%]
2025-12-04T10:35:20.0622347Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6868s] [  4%]
2025-12-04T10:35:20.0623326Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 FAILED [0.6911s] [  4%]
2025-12-04T10:35:20.0623801Z 
2025-12-04T10:35:20.0623918Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.0624393Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.0624847Z Traceback (most recent call last):
2025-12-04T10:35:20.0625373Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.0625978Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.0626616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0627344Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0628097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0628803Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0629501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0630162Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0630834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0631673Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0632502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0633175Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0633810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0634525Z     return self._compile_to_module()
2025-12-04T10:35:20.0635129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0635887Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0636558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0637217Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0637842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0638565Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0639366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0640078Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0640714Z   File "/tmp/tmpn9o5rqa_/he/che3ee5fwgdgzz2elixqzfqkaog7xykdwzwtlsuwigqocfit3hks.py", line 193, in <module>
2025-12-04T10:35:20.0641686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.0642271Z     self._wait_futures(scope)
2025-12-04T10:35:20.0642858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.0643972Z     kernel = result.result()
2025-12-04T10:35:20.0644510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.0645132Z     return self.result_fn()
2025-12-04T10:35:20.0645698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.0646317Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.0646855Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0647294Z 
2025-12-04T10:35:20.0647467Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0647842Z Traceback (most recent call last):
2025-12-04T10:35:20.0648485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0649134Z     result = job()
2025-12-04T10:35:20.0649762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0650493Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0651183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0651856Z     self._precompile_worker()
2025-12-04T10:35:20.0652692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0653730Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0654610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0655543Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0656385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0657145Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0658058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0658990Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0659726Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0660563Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0661325Z ^
2025-12-04T10:35:20.0661941Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0662479Z 
2025-12-04T10:35:20.0662483Z 
2025-12-04T10:35:20.0663142Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0663948Z 
2025-12-04T10:35:20.0663951Z 
2025-12-04T10:35:20.0664183Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0665201Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0666028Z 
2025-12-04T10:35:20.0666277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0666986Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0667522Z frames [('total', 1)]
2025-12-04T10:35:20.0667845Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0668391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0669350Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0670202Z graph_break []
2025-12-04T10:35:20.0670666Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0671300Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.0671830Z Traceback (most recent call last):
2025-12-04T10:35:20.0672523Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.0673198Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.0674013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0674908Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0675763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0676634Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0677448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0678204Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0679024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0679981Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0680937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0681707Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0682457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0683202Z     return self._compile_to_module()
2025-12-04T10:35:20.0684036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0684762Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0685571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0686426Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0687227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0688054Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0689011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0689847Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0690553Z   File "/tmp/tmp4w4t2s34/t3/ct3eoqyx4525zlli6efa35d6a67do2d25lzzayqzlgzmidr2bec6.py", line 193, in <module>
2025-12-04T10:35:20.0691654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.0692349Z     self._wait_futures(scope)
2025-12-04T10:35:20.0693023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.0693890Z     kernel = result.result()
2025-12-04T10:35:20.0694541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.0695204Z     return self.result_fn()
2025-12-04T10:35:20.0696019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.0696708Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.0697323Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0697931Z 
2025-12-04T10:35:20.0698189Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0698630Z Traceback (most recent call last):
2025-12-04T10:35:20.0699447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0700319Z     result = job()
2025-12-04T10:35:20.0701045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0701939Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0702811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0703606Z     self._precompile_worker()
2025-12-04T10:35:20.0704424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0705273Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0706210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0707136Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0708075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0708869Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0709718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0710609Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0711370Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0712149Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0712889Z ^
2025-12-04T10:35:20.0713472Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0713996Z 
2025-12-04T10:35:20.0714000Z 
2025-12-04T10:35:20.0714742Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0715567Z 
2025-12-04T10:35:20.0715738Z 
2025-12-04T10:35:20.0715967Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0716959Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0717785Z 
2025-12-04T10:35:20.0718078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0718698Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0719136Z frames [('total', 1)]
2025-12-04T10:35:20.0719548Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0720081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0721025Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0721928Z graph_break []
2025-12-04T10:35:20.0722303Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0722822Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0723386Z frames [('total', 1)]
2025-12-04T10:35:20.0723687Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0724143Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0725192Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0726126Z graph_break []
2025-12-04T10:35:20.0726499Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0727175Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0727782Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.0728283Z Traceback (most recent call last):
2025-12-04T10:35:20.0728977Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.0729668Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.0730356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0731259Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0732140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0732988Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0733774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0734561Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0735383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0736485Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0737380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0738211Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0738986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0739759Z     return self._compile_to_module()
2025-12-04T10:35:20.0740469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0741271Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0742046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0742972Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0743671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0744499Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0745554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0746379Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0747085Z   File "/tmp/tmpzz12oiau/3z/c3zpuujs6tubvaxkxduwi267o25fgvl76andvjwe7kffrs5h5o4a.py", line 193, in <module>
2025-12-04T10:35:20.0748193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.0748881Z     self._wait_futures(scope)
2025-12-04T10:35:20.0749545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.0750347Z     kernel = result.result()
2025-12-04T10:35:20.0750984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.0751677Z     return self.result_fn()
2025-12-04T10:35:20.0752380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.0753153Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.0753795Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0754400Z 
2025-12-04T10:35:20.0754654Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0755143Z Traceback (most recent call last):
2025-12-04T10:35:20.0755952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0756777Z     result = job()
2025-12-04T10:35:20.0757539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0758321Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0759182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0759979Z     self._precompile_worker()
2025-12-04T10:35:20.0760708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0761646Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0762530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0763536Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0764277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0765098Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0765996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0766903Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0767517Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0768340Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0769073Z ^
2025-12-04T10:35:20.0769654Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0770176Z 
2025-12-04T10:35:20.0770180Z 
2025-12-04T10:35:20.0770958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0771788Z 
2025-12-04T10:35:20.0771792Z 
2025-12-04T10:35:20.0772005Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0773060Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0773792Z 
2025-12-04T10:35:20.0774179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0774798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0775239Z frames [('total', 1)]
2025-12-04T10:35:20.0775652Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0776102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0777042Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0777981Z graph_break []
2025-12-04T10:35:20.0778346Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0778888Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0779457Z frames [('total', 1)]
2025-12-04T10:35:20.0779859Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0780371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0781403Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0782284Z graph_break []
2025-12-04T10:35:20.0782633Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0783244Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0783710Z frames [('total', 1)]
2025-12-04T10:35:20.0784028Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0784547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0785551Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0786374Z graph_break []
2025-12-04T10:35:20.0786801Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0787810Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml -
2025-12-04T10:35:20.0788763Z =========================== short test summary info ============================
2025-12-04T10:35:20.0789879Z FAILED [0.6911s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0790805Z 
2025-12-04T10:35:20.0791008Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0791520Z Traceback (most recent call last):
2025-12-04T10:35:20.0792312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0793026Z     result = job()
2025-12-04T10:35:20.0793783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0794655Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0795412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0796142Z     self._precompile_worker()
2025-12-04T10:35:20.0796827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0802941Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0803762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0804564Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0805234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0806014Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0806717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0807496Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0808242Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0808939Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0809547Z ^
2025-12-04T10:35:20.0810035Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0810550Z 
2025-12-04T10:35:20.0810554Z 
2025-12-04T10:35:20.0811160Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0811972Z 
2025-12-04T10:35:20.0811976Z 
2025-12-04T10:35:20.0812159Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0813115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0813825Z 
2025-12-04T10:35:20.0814057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0814560Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0815036Z ======== 1 failed, 5 passed, 1 skipped, 15 deselected, 2 rerun in 5.35s ========
2025-12-04T10:35:20.0815457Z Got exit code 1
2025-12-04T10:35:20.0815712Z Retrying single test...
2025-12-04T10:35:20.0816272Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml
2025-12-04T10:35:20.0816937Z ============================= test session starts ==============================
2025-12-04T10:35:20.0817497Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0817998Z cachedir: .pytest_cache
2025-12-04T10:35:20.0818598Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0819312Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0819599Z configfile: pytest.ini
2025-12-04T10:35:20.0820223Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0820987Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.0821816Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0822546Z Running 1 items in this shard
2025-12-04T10:35:20.0822731Z 
2025-12-04T10:35:20.0823537Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:13.244626906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0824456Z 
2025-12-04T10:35:20.0824897Z [W1204 10:21:22.874945366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0825461Z 
2025-12-04T10:35:20.0826089Z [W1204 10:21:22.875200301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0826640Z 
2025-12-04T10:35:20.0827076Z [W1204 10:21:22.877505266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0827630Z 
2025-12-04T10:35:20.0828067Z [W1204 10:21:22.877692230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0828626Z 
2025-12-04T10:35:20.0829053Z [W1204 10:21:22.879827542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0829609Z 
2025-12-04T10:35:20.0830044Z [W1204 10:21:22.880170298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0830595Z 
2025-12-04T10:35:20.0831040Z [W1204 10:21:22.880345852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0831590Z 
2025-12-04T10:35:20.0832033Z [W1204 10:21:22.880756270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0832587Z 
2025-12-04T10:35:20.0833013Z [W1204 10:21:22.880926393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0833618Z 
2025-12-04T10:35:20.0834054Z [W1204 10:21:22.881389692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0834725Z 
2025-12-04T10:35:20.0835155Z [W1204 10:21:22.881559336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0835705Z 
2025-12-04T10:35:20.0836148Z [W1204 10:21:22.881907692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0836699Z 
2025-12-04T10:35:20.0837140Z [W1204 10:21:22.882075416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0837694Z 
2025-12-04T10:35:20.0838125Z [W1204 10:21:22.882399012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0838684Z 
2025-12-04T10:35:20.0839112Z [W1204 10:21:22.882564115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0839674Z 
2025-12-04T10:35:20.0840105Z [W1204 10:21:22.882883152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0840657Z 
2025-12-04T10:35:20.0841100Z [W1204 10:21:22.883053455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0841650Z 
2025-12-04T10:35:20.0841765Z ('RERUN', {'yellow': True}) [11.8361s] [100%]
2025-12-04T10:35:20.0842783Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:23.018839188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0843704Z 
2025-12-04T10:35:20.0844133Z [W1204 10:21:23.019208415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0844694Z 
2025-12-04T10:35:20.0845131Z [W1204 10:21:23.019377728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0845740Z 
2025-12-04T10:35:20.0846180Z [W1204 10:21:23.019854368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0846736Z 
2025-12-04T10:35:20.0847260Z [W1204 10:21:24.020048411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0847816Z 
2025-12-04T10:35:20.0848247Z [W1204 10:21:24.020354697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0848809Z 
2025-12-04T10:35:20.0849243Z [W1204 10:21:24.020599552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0849802Z 
2025-12-04T10:35:20.0850236Z [W1204 10:21:24.020759475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0850800Z 
2025-12-04T10:35:20.0851239Z [W1204 10:21:24.021125812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0851789Z 
2025-12-04T10:35:20.0852236Z [W1204 10:21:24.021293376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0852789Z 
2025-12-04T10:35:20.0853232Z [W1204 10:21:24.021666163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0853782Z 
2025-12-04T10:35:20.0854216Z [W1204 10:21:24.021833726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0854825Z 
2025-12-04T10:35:20.0855256Z [W1204 10:21:24.022153082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0855904Z 
2025-12-04T10:35:20.0856333Z [W1204 10:21:24.022318346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0856887Z 
2025-12-04T10:35:20.0857329Z [W1204 10:21:24.022617902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0857881Z 
2025-12-04T10:35:20.0858318Z [W1204 10:21:24.022780575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0858866Z 
2025-12-04T10:35:20.0859348Z [W1204 10:21:24.023081891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0859907Z 
2025-12-04T10:35:20.0860338Z [W1204 10:21:24.023264954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0860899Z 
2025-12-04T10:35:20.0861006Z ('RERUN', {'yellow': True}) [0.6955s] [100%]
2025-12-04T10:35:20.0862027Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:24.717025252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0862934Z 
2025-12-04T10:35:20.0863378Z [W1204 10:21:24.717378399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0863932Z 
2025-12-04T10:35:20.0864361Z [W1204 10:21:24.717546002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0864921Z 
2025-12-04T10:35:20.0865349Z [W1204 10:21:24.718013861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0865903Z 
2025-12-04T10:35:20.0866337Z [W1204 10:21:24.718188945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0866887Z 
2025-12-04T10:35:20.0867412Z [W1204 10:21:24.718487601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0867965Z 
2025-12-04T10:35:20.0868403Z [W1204 10:21:24.718730325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0868956Z 
2025-12-04T10:35:20.0869384Z [W1204 10:21:24.718891319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0869945Z 
2025-12-04T10:35:20.0870374Z [W1204 10:21:24.719281516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0870931Z 
2025-12-04T10:35:20.0871365Z [W1204 10:21:24.719450250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0871915Z 
2025-12-04T10:35:20.0872350Z [W1204 10:21:24.719817857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0872906Z 
2025-12-04T10:35:20.0873340Z [W1204 10:21:24.719983540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0873896Z 
2025-12-04T10:35:20.0874327Z [W1204 10:21:24.720340757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0874879Z 
2025-12-04T10:35:20.0875359Z [W1204 10:21:24.720510120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0875968Z 
2025-12-04T10:35:20.0876397Z [W1204 10:21:24.720812906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0876992Z 
2025-12-04T10:35:20.0877429Z [W1204 10:21:24.720978199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0877982Z 
2025-12-04T10:35:20.0878422Z [W1204 10:21:24.721287535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0878973Z 
2025-12-04T10:35:20.0879404Z [W1204 10:21:24.721452069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0879959Z 
2025-12-04T10:35:20.0880039Z FAILED [0.7148s] [100%]
2025-12-04T10:35:20.0880196Z 
2025-12-04T10:35:20.0880316Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.0880809Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.0881267Z Traceback (most recent call last):
2025-12-04T10:35:20.0881797Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.0882368Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.0883026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0883765Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0884537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0885265Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0886026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0886711Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0887409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0888266Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0889193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0889883Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0890531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0891159Z     return self._compile_to_module()
2025-12-04T10:35:20.0891764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0892442Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0893144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0893810Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0894457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0895199Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0896023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0896752Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0897426Z   File "/tmp/tmpmuh47gt6/xf/cxfpjbopqoo6er7nay4wy7kqrqyhdkgfou7cxikujsoorskbn76t.py", line 193, in <module>
2025-12-04T10:35:20.0898437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.0899097Z     self._wait_futures(scope)
2025-12-04T10:35:20.0899698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.0900414Z     kernel = result.result()
2025-12-04T10:35:20.0900966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.0901557Z     return self.result_fn()
2025-12-04T10:35:20.0902143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.0902790Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.0903342Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0903803Z 
2025-12-04T10:35:20.0903987Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0904389Z Traceback (most recent call last):
2025-12-04T10:35:20.0905059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0905768Z     result = job()
2025-12-04T10:35:20.0906421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0907172Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0908035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0908726Z     self._precompile_worker()
2025-12-04T10:35:20.0909412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0910200Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0910973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0911769Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0912447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0913162Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0913987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0914774Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0915335Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0916069Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0916675Z ^
2025-12-04T10:35:20.0917161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0917166Z 
2025-12-04T10:35:20.0917171Z 
2025-12-04T10:35:20.0917853Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0917865Z 
2025-12-04T10:35:20.0917870Z 
2025-12-04T10:35:20.0918116Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0918892Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0918911Z 
2025-12-04T10:35:20.0919206Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0919392Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0919575Z frames [('total', 1)]
2025-12-04T10:35:20.0919673Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0920242Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0920494Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0920573Z graph_break []
2025-12-04T10:35:20.0920730Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0920911Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.0921960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.0922065Z   if out == self.unknown_value:
2025-12-04T10:35:20.0922306Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.0922408Z Traceback (most recent call last):
2025-12-04T10:35:20.0922747Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.0922866Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.0923290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0923505Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0923942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0924108Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0924543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0924671Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0925124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0925398Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0925851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0925974Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0926467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0926572Z     return self._compile_to_module()
2025-12-04T10:35:20.0926983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0927130Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0927574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0927680Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0928101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0928300Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0928808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0928911Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0929357Z   File "/tmp/tmpkmvylrrf/em/cemmtpb4skdnjxt2ufsdlm7xfsxvsgbunm3eh5n6njfmsvuxg3my.py", line 193, in <module>
2025-12-04T10:35:20.0929748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.0929847Z     self._wait_futures(scope)
2025-12-04T10:35:20.0930308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.0930410Z     kernel = result.result()
2025-12-04T10:35:20.0930826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.0930931Z     return self.result_fn()
2025-12-04T10:35:20.0931338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.0931449Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.0931777Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0931783Z 
2025-12-04T10:35:20.0932061Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0932167Z Traceback (most recent call last):
2025-12-04T10:35:20.0932623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0932711Z     result = job()
2025-12-04T10:35:20.0933215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0933335Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0933803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0933907Z     self._precompile_worker()
2025-12-04T10:35:20.0934408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0934568Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0935071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0935236Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0935622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0935828Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0936213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0936583Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0936742Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0937164Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0937233Z ^
2025-12-04T10:35:20.0937618Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0937629Z 
2025-12-04T10:35:20.0937633Z 
2025-12-04T10:35:20.0938236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0938243Z 
2025-12-04T10:35:20.0938247Z 
2025-12-04T10:35:20.0938431Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0939088Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0939093Z 
2025-12-04T10:35:20.0939317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0939500Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0939584Z frames [('total', 1)]
2025-12-04T10:35:20.0939681Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0940257Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0940489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0940609Z graph_break []
2025-12-04T10:35:20.0940758Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0940938Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.0941985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.0942080Z   if out == self.unknown_value:
2025-12-04T10:35:20.0942259Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0942349Z frames [('total', 1)]
2025-12-04T10:35:20.0942445Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0942640Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0943203Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0943283Z graph_break []
2025-12-04T10:35:20.0943431Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0943548Z =================================== FAILURES ===================================
2025-12-04T10:35:20.0943791Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.0943896Z Traceback (most recent call last):
2025-12-04T10:35:20.0944225Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.0944347Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.0944762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.0944973Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.0945437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.0945627Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.0946074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.0946276Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.0946729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.0947015Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.0947457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.0947587Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.0948007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.0948110Z     return self._compile_to_module()
2025-12-04T10:35:20.0948532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.0948669Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.0949113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.0949227Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.0949645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.0949850Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.0950423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.0950525Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.0951012Z   File "/tmp/tmpt5hrd8p2/ef/cefo55iyjfzqwrbb6wixkhlpge6vd5bfdnzv37ebwsiu33u3x45j.py", line 193, in <module>
2025-12-04T10:35:20.0951395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.0951493Z     self._wait_futures(scope)
2025-12-04T10:35:20.0951924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.0952020Z     kernel = result.result()
2025-12-04T10:35:20.0952405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.0952500Z     return self.result_fn()
2025-12-04T10:35:20.0952909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.0953024Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.0953351Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0953356Z 
2025-12-04T10:35:20.0953540Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0953646Z Traceback (most recent call last):
2025-12-04T10:35:20.0954117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0954204Z     result = job()
2025-12-04T10:35:20.0954703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0954820Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0955292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0955388Z     self._precompile_worker()
2025-12-04T10:35:20.0955898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0956046Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0956631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0956802Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0957184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0957386Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0957764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0958046Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0958205Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0958627Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0958696Z ^
2025-12-04T10:35:20.0959096Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0959101Z 
2025-12-04T10:35:20.0959105Z 
2025-12-04T10:35:20.0959711Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0959716Z 
2025-12-04T10:35:20.0959720Z 
2025-12-04T10:35:20.0959911Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0960551Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0960556Z 
2025-12-04T10:35:20.0960788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0961009Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0961088Z frames [('total', 1)]
2025-12-04T10:35:20.0961191Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0961766Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0961960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0962045Z graph_break []
2025-12-04T10:35:20.0962189Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0962371Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.0963408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.0963504Z   if out == self.unknown_value:
2025-12-04T10:35:20.0963684Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0963771Z frames [('total', 1)]
2025-12-04T10:35:20.0963876Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0964062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0964619Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0964709Z graph_break []
2025-12-04T10:35:20.0964856Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0965030Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.0965116Z frames [('total', 1)]
2025-12-04T10:35:20.0965206Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.0965394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.0966001Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.0966159Z graph_break []
2025-12-04T10:35:20.0966314Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.0966868Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml -
2025-12-04T10:35:20.0967010Z =========================== short test summary info ============================
2025-12-04T10:35:20.0967767Z FAILED [0.7148s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.0967772Z 
2025-12-04T10:35:20.0967949Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.0968059Z Traceback (most recent call last):
2025-12-04T10:35:20.0968527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.0968610Z     result = job()
2025-12-04T10:35:20.0969134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.0969254Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.0969736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.0969872Z     self._precompile_worker()
2025-12-04T10:35:20.0970378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.0970535Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.0971080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.0971254Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.0971645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.0971857Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.0972244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.0972533Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.0972696Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.0973128Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.0973201Z ^
2025-12-04T10:35:20.0973604Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.0973609Z 
2025-12-04T10:35:20.0973613Z 
2025-12-04T10:35:20.0974226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.0974230Z 
2025-12-04T10:35:20.0974234Z 
2025-12-04T10:35:20.0974427Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.0975027Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0975034Z 
2025-12-04T10:35:20.0975260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.0975430Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.0975616Z ================= 1 failed, 187 deselected, 2 rerun in 13.28s ==================
2025-12-04T10:35:20.0975721Z Got exit code 1
2025-12-04T10:35:20.0975822Z Retrying single test...
2025-12-04T10:35:20.0976322Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml
2025-12-04T10:35:20.0976473Z ============================= test session starts ==============================
2025-12-04T10:35:20.0976766Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.0976856Z cachedir: .pytest_cache
2025-12-04T10:35:20.0977311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.0977423Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.0977528Z configfile: pytest.ini
2025-12-04T10:35:20.0977991Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.0978179Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.0978718Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.0978813Z Running 1 items in this shard
2025-12-04T10:35:20.0978818Z 
2025-12-04T10:35:20.0979703Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:33.898327559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0979708Z 
2025-12-04T10:35:20.0980148Z [W1204 10:21:43.423726291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0980197Z 
2025-12-04T10:35:20.0980634Z [W1204 10:21:43.423962555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0980687Z 
2025-12-04T10:35:20.0981119Z [W1204 10:21:43.426211739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0981124Z 
2025-12-04T10:35:20.0981557Z [W1204 10:21:43.426400383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0981562Z 
2025-12-04T10:35:20.0982002Z [W1204 10:21:43.428510754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0982006Z 
2025-12-04T10:35:20.0982434Z [W1204 10:21:43.428793240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0982441Z 
2025-12-04T10:35:20.0982880Z [W1204 10:21:43.428953803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0982888Z 
2025-12-04T10:35:20.0983324Z [W1204 10:21:43.429358051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0983328Z 
2025-12-04T10:35:20.0983773Z [W1204 10:21:43.429527394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0983779Z 
2025-12-04T10:35:20.0984211Z [W1204 10:21:43.429985833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0984216Z 
2025-12-04T10:35:20.0984659Z [W1204 10:21:43.430208658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0984666Z 
2025-12-04T10:35:20.0985092Z [W1204 10:21:43.430569305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0985099Z 
2025-12-04T10:35:20.0985536Z [W1204 10:21:43.430734908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0985541Z 
2025-12-04T10:35:20.0986082Z [W1204 10:21:43.431045164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0986087Z 
2025-12-04T10:35:20.0986523Z [W1204 10:21:43.431217547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0986527Z 
2025-12-04T10:35:20.0986969Z [W1204 10:21:43.431533303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0986976Z 
2025-12-04T10:35:20.0987404Z [W1204 10:21:43.431703827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0987410Z 
2025-12-04T10:35:20.0987525Z ('RERUN', {'yellow': True}) [11.7088s] [100%]
2025-12-04T10:35:20.0988329Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:44.559025563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0988338Z 
2025-12-04T10:35:20.0988785Z [W1204 10:21:44.559387380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0988789Z 
2025-12-04T10:35:20.0989227Z [W1204 10:21:44.559555853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0989275Z 
2025-12-04T10:35:20.0989706Z [W1204 10:21:44.560054593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0989722Z 
2025-12-04T10:35:20.0990152Z [W1204 10:21:44.560232396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0990199Z 
2025-12-04T10:35:20.0990637Z [W1204 10:21:44.560530222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0990641Z 
2025-12-04T10:35:20.0991084Z [W1204 10:21:44.560780787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0991088Z 
2025-12-04T10:35:20.0991517Z [W1204 10:21:44.560937820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0991521Z 
2025-12-04T10:35:20.0991963Z [W1204 10:21:44.561296617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0991967Z 
2025-12-04T10:35:20.0992402Z [W1204 10:21:44.561462610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0992408Z 
2025-12-04T10:35:20.0992851Z [W1204 10:21:44.561834858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0992855Z 
2025-12-04T10:35:20.0993292Z [W1204 10:21:44.562000031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0993296Z 
2025-12-04T10:35:20.0993729Z [W1204 10:21:44.562321207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0993743Z 
2025-12-04T10:35:20.0994181Z [W1204 10:21:44.562484950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0994188Z 
2025-12-04T10:35:20.0994614Z [W1204 10:21:44.562782536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0994621Z 
2025-12-04T10:35:20.0995051Z [W1204 10:21:44.562949909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0995055Z 
2025-12-04T10:35:20.0995569Z [W1204 10:21:44.563259285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0995574Z 
2025-12-04T10:35:20.0996023Z [W1204 10:21:44.563430759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0996027Z 
2025-12-04T10:35:20.0996135Z ('RERUN', {'yellow': True}) [0.6979s] [100%]
2025-12-04T10:35:20.0996937Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:45.271283411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0996945Z 
2025-12-04T10:35:20.0997379Z [W1204 10:21:45.271658639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0997383Z 
2025-12-04T10:35:20.0997829Z [W1204 10:21:45.271827962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0997834Z 
2025-12-04T10:35:20.0998259Z [W1204 10:21:45.273514045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0998264Z 
2025-12-04T10:35:20.0998696Z [W1204 10:21:45.273690968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0998742Z 
2025-12-04T10:35:20.0999191Z [W1204 10:21:45.274029415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0999196Z 
2025-12-04T10:35:20.0999668Z [W1204 10:21:45.274302490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.0999672Z 
2025-12-04T10:35:20.1000124Z [W1204 10:21:45.274465444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1000128Z 
2025-12-04T10:35:20.1000559Z [W1204 10:21:45.274906802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1000564Z 
2025-12-04T10:35:20.1001005Z [W1204 10:21:45.275076675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1001012Z 
2025-12-04T10:35:20.1001446Z [W1204 10:21:45.275514584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1001450Z 
2025-12-04T10:35:20.1001890Z [W1204 10:21:45.275683797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1001898Z 
2025-12-04T10:35:20.1002326Z [W1204 10:21:45.276231038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1002335Z 
2025-12-04T10:35:20.1002763Z [W1204 10:21:45.276401051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1002774Z 
2025-12-04T10:35:20.1003212Z [W1204 10:21:45.276861950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1003216Z 
2025-12-04T10:35:20.1003653Z [W1204 10:21:45.277030664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1003657Z 
2025-12-04T10:35:20.1004092Z [W1204 10:21:45.277399771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1004099Z 
2025-12-04T10:35:20.1004530Z [W1204 10:21:45.277567154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1004534Z 
2025-12-04T10:35:20.1004703Z FAILED [0.7096s] [100%]
2025-12-04T10:35:20.1004708Z 
2025-12-04T10:35:20.1004833Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1005077Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.1005192Z Traceback (most recent call last):
2025-12-04T10:35:20.1005529Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1005685Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1006125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1006339Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1006787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1006954Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1007408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1007531Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1008176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1008525Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1008966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1009145Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1009561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1009662Z     return self._compile_to_module()
2025-12-04T10:35:20.1010093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1010228Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1010665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1010776Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1011196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1011403Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1011976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1012084Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1012535Z   File "/tmp/tmpt3ffkjt4/wu/cwulgp2m4lgii7pneh6iwn2fog3jajfg7bbwpv7q5q7ouztflghj.py", line 193, in <module>
2025-12-04T10:35:20.1012925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1013022Z     self._wait_futures(scope)
2025-12-04T10:35:20.1013453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1013548Z     kernel = result.result()
2025-12-04T10:35:20.1013937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1014034Z     return self.result_fn()
2025-12-04T10:35:20.1014444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1014562Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1014891Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1014897Z 
2025-12-04T10:35:20.1015193Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1015294Z Traceback (most recent call last):
2025-12-04T10:35:20.1015751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1015835Z     result = job()
2025-12-04T10:35:20.1016331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1016452Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1016928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1017029Z     self._precompile_worker()
2025-12-04T10:35:20.1017551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1017704Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1018213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1018393Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1018778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1019074Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1019452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1019780Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1019945Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1020369Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1020436Z ^
2025-12-04T10:35:20.1020836Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1020841Z 
2025-12-04T10:35:20.1020845Z 
2025-12-04T10:35:20.1021455Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1021463Z 
2025-12-04T10:35:20.1021467Z 
2025-12-04T10:35:20.1021650Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1022251Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.1022258Z 
2025-12-04T10:35:20.1022488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1022672Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1022761Z frames [('total', 1)]
2025-12-04T10:35:20.1022866Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1023430Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1023625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1023704Z graph_break []
2025-12-04T10:35:20.1023852Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1024044Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1025086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1025298Z   if out == self.unknown_value:
2025-12-04T10:35:20.1025554Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.1025657Z Traceback (most recent call last):
2025-12-04T10:35:20.1025999Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1026117Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1026535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1026756Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1027191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1027364Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1027797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1027917Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1028384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1028656Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1029097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1029268Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1029669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1029820Z     return self._compile_to_module()
2025-12-04T10:35:20.1030227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1030370Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1030810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1030917Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1031340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1031543Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1032040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1032158Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1032596Z   File "/tmp/tmpambjhoyd/l4/cl4cyoufkny46ifn7zy4my4osg3vzcqmkwieubma7tvyppx4f7v2.py", line 193, in <module>
2025-12-04T10:35:20.1032980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1033088Z     self._wait_futures(scope)
2025-12-04T10:35:20.1033516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1033608Z     kernel = result.result()
2025-12-04T10:35:20.1033987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1034079Z     return self.result_fn()
2025-12-04T10:35:20.1034494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1034605Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1034935Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1034943Z 
2025-12-04T10:35:20.1035116Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1035214Z Traceback (most recent call last):
2025-12-04T10:35:20.1035809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1035890Z     result = job()
2025-12-04T10:35:20.1036388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1036506Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1036978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1037071Z     self._precompile_worker()
2025-12-04T10:35:20.1037571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1037724Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1038233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1038398Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1038783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1038985Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1039363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1039780Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1039935Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1040390Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1040470Z ^
2025-12-04T10:35:20.1040860Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1040865Z 
2025-12-04T10:35:20.1040870Z 
2025-12-04T10:35:20.1041481Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1041486Z 
2025-12-04T10:35:20.1041490Z 
2025-12-04T10:35:20.1041667Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1042271Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.1042275Z 
2025-12-04T10:35:20.1042500Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1042676Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1042764Z frames [('total', 1)]
2025-12-04T10:35:20.1042856Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1043422Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1043615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1043694Z graph_break []
2025-12-04T10:35:20.1043840Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1044023Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1045065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1045169Z   if out == self.unknown_value:
2025-12-04T10:35:20.1045343Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1045506Z frames [('total', 1)]
2025-12-04T10:35:20.1045601Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1045810Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1046404Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1046486Z graph_break []
2025-12-04T10:35:20.1046634Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1046758Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1046998Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________
2025-12-04T10:35:20.1047105Z Traceback (most recent call last):
2025-12-04T10:35:20.1047434Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1047551Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1047965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1048177Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1048613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1048784Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1049256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1049377Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1049875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1050145Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1050593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1050712Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1051120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1051217Z     return self._compile_to_module()
2025-12-04T10:35:20.1051631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1051766Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1052198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1052305Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1052724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1052920Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1053420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1053520Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1053946Z   File "/tmp/tmpfyq3_txa/yj/cyjxtrojtuefmvvz55mw3yodhqgvovybyvjxpy3euykm72uc2sv7.py", line 193, in <module>
2025-12-04T10:35:20.1054334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1054425Z     self._wait_futures(scope)
2025-12-04T10:35:20.1054857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1054952Z     kernel = result.result()
2025-12-04T10:35:20.1055333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1055540Z     return self.result_fn()
2025-12-04T10:35:20.1055962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1056065Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1056390Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1056394Z 
2025-12-04T10:35:20.1056566Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1060925Z Traceback (most recent call last):
2025-12-04T10:35:20.1061420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1061505Z     result = job()
2025-12-04T10:35:20.1062017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1062136Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1062634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1062737Z     self._precompile_worker()
2025-12-04T10:35:20.1063246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1063413Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1063995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1064162Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1064621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1064825Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1065217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1065526Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1065708Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1066144Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1066220Z ^
2025-12-04T10:35:20.1066612Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1066617Z 
2025-12-04T10:35:20.1066623Z 
2025-12-04T10:35:20.1067231Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1067236Z 
2025-12-04T10:35:20.1067240Z 
2025-12-04T10:35:20.1067423Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1068022Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.1068027Z 
2025-12-04T10:35:20.1068248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1068436Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1068522Z frames [('total', 1)]
2025-12-04T10:35:20.1068615Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1069183Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1069371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1069466Z graph_break []
2025-12-04T10:35:20.1069618Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1069886Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1070933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1071029Z   if out == self.unknown_value:
2025-12-04T10:35:20.1071215Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1071299Z frames [('total', 1)]
2025-12-04T10:35:20.1071393Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1071590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1072155Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1072246Z graph_break []
2025-12-04T10:35:20.1072400Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1072574Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1072666Z frames [('total', 1)]
2025-12-04T10:35:20.1072760Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1072949Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1073561Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1073639Z graph_break []
2025-12-04T10:35:20.1073822Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1074388Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml -
2025-12-04T10:35:20.1074533Z =========================== short test summary info ============================
2025-12-04T10:35:20.1075294Z FAILED [0.7096s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1075299Z 
2025-12-04T10:35:20.1075487Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1075598Z Traceback (most recent call last):
2025-12-04T10:35:20.1076101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1076181Z     result = job()
2025-12-04T10:35:20.1076695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1076816Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1077289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1077397Z     self._precompile_worker()
2025-12-04T10:35:20.1077908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1078058Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1078567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1078732Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1079114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1079318Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1079692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1080065Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1080225Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1080657Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1080728Z ^
2025-12-04T10:35:20.1081118Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1081125Z 
2025-12-04T10:35:20.1081129Z 
2025-12-04T10:35:20.1081742Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1081750Z 
2025-12-04T10:35:20.1081754Z 
2025-12-04T10:35:20.1081933Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1082540Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.1082545Z 
2025-12-04T10:35:20.1082772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1082930Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1083102Z ================= 1 failed, 187 deselected, 2 rerun in 13.15s ==================
2025-12-04T10:35:20.1083224Z Got exit code 1
2025-12-04T10:35:20.1083616Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16
2025-12-04T10:35:20.1083964Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.1084403Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml
2025-12-04T10:35:20.1084544Z ============================= test session starts ==============================
2025-12-04T10:35:20.1084846Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1084945Z cachedir: .pytest_cache
2025-12-04T10:35:20.1085389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1085491Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1085604Z configfile: pytest.ini
2025-12-04T10:35:20.1086106Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1086296Z collecting ... collected 188 items / 22 deselected / 166 selected
2025-12-04T10:35:20.1086427Z stepcurrent: skipping 22 already run items.
2025-12-04T10:35:20.1086522Z Running 166 items in this shard
2025-12-04T10:35:20.1086526Z 
2025-12-04T10:35:20.1087534Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1088339Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1088802Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1089277Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.1089694Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.1090059Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.1090537Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x1 = (xindex % ks1)
2025-12-04T10:35:20.1091056Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x2 = triton_helpers.div_floor_integer(xindex,  ks1)
2025-12-04T10:35:20.1091528Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + load_seed_offset)
2025-12-04T10:35:20.1091891Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = x0
2025-12-04T10:35:20.1092375Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32))
2025-12-04T10:35:20.1092806Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp2.to(tl.float32)
2025-12-04T10:35:20.1093260Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float8e4nv)
2025-12-04T10:35:20.1093838Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask)
2025-12-04T10:35:20.1094138Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1095831Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1096334Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1097227Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1097763Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1098524Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1099167Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1099982Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1100682Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1101241Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1102100Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1102507Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1103277Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1103389Z ('RERUN', {'yellow': True}) [2.1723s] [  0%]
2025-12-04T10:35:20.1104378Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1105165Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1105669Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1106158Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.1106573Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.1106938Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.1107397Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x1 = (xindex % ks1)
2025-12-04T10:35:20.1108204Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x2 = triton_helpers.div_floor_integer(xindex,  ks1)
2025-12-04T10:35:20.1108689Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + load_seed_offset)
2025-12-04T10:35:20.1109041Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = x0
2025-12-04T10:35:20.1109519Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32))
2025-12-04T10:35:20.1109951Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp2.to(tl.float32)
2025-12-04T10:35:20.1110400Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float8e4nv)
2025-12-04T10:35:20.1110963Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask)
2025-12-04T10:35:20.1111265Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1112898Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1113352Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1114239Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1114896Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1115662Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1116239Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1116997Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1117648Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1118167Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1118961Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1119322Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1120085Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1120247Z ('RERUN', {'yellow': True}) [0.4287s] [  0%]
2025-12-04T10:35:20.1121234Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1122023Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1122481Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1122963Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.1123380Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.1123749Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.1124150Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x1 = (xindex % ks1)
2025-12-04T10:35:20.1124651Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x2 = triton_helpers.div_floor_integer(xindex,  ks1)
2025-12-04T10:35:20.1125122Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + load_seed_offset)
2025-12-04T10:35:20.1125476Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = x0
2025-12-04T10:35:20.1125951Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32))
2025-12-04T10:35:20.1126380Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp2.to(tl.float32)
2025-12-04T10:35:20.1126907Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float8e4nv)
2025-12-04T10:35:20.1127483Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask)
2025-12-04T10:35:20.1127784Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1129421Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1129876Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1130767Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1131339Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1132095Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1132713Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1133457Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1134117Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1134634Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1135432Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1135741Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1136501Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1136587Z FAILED [0.4277s] [  0%]
2025-12-04T10:35:20.1136592Z 
2025-12-04T10:35:20.1136714Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1136957Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1137058Z Traceback (most recent call last):
2025-12-04T10:35:20.1137396Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1137513Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1138003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1138219Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1138659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1138821Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1139328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1139454Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1139920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1140191Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1140635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1140765Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1141168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1141271Z     return self._compile_to_module()
2025-12-04T10:35:20.1141680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1141863Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1142307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1142457Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1142876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1143077Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1143587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1143696Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1144128Z   File "/tmp/tmpw1bmfch9/yx/cyxlu4hzpv7kciwzh33qgdxvtkvckv7cr5jucrxqo7oi5d2sdr2n.py", line 60, in <module>
2025-12-04T10:35:20.1144524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1144623Z     kernel.precompile(
2025-12-04T10:35:20.1145093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1145194Z     self._precompile_worker()
2025-12-04T10:35:20.1145740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1145902Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1146419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1146587Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1146968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1147180Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1147561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1147850Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1148047Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1148473Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1148660Z ^
2025-12-04T10:35:20.1149052Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1149057Z 
2025-12-04T10:35:20.1149670Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1149678Z 
2025-12-04T10:35:20.1149682Z 
2025-12-04T10:35:20.1149861Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1150449Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1150464Z 
2025-12-04T10:35:20.1150690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1150877Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1150972Z frames [('total', 1)]
2025-12-04T10:35:20.1151067Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1151530Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1151727Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1151806Z graph_break []
2025-12-04T10:35:20.1152027Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1152270Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1152370Z Traceback (most recent call last):
2025-12-04T10:35:20.1152751Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1152867Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1153276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1153497Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1153933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1154098Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1154534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1154654Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1155111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1155386Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1155891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1156017Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1156424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1156530Z     return self._compile_to_module()
2025-12-04T10:35:20.1156939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1157077Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1157517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1157626Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1158053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1158245Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1158827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1158943Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1159367Z   File "/tmp/tmpcm0pa6f9/zn/czn66v4xmhea5twk6qxq65kb4b7kbketol6ch6z6h4du7mkb7z5h.py", line 60, in <module>
2025-12-04T10:35:20.1159767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1159861Z     kernel.precompile(
2025-12-04T10:35:20.1160339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1160441Z     self._precompile_worker()
2025-12-04T10:35:20.1160949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1161095Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1161615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1161781Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1162169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1162371Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1162789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1163078Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1163311Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1163739Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1163819Z ^
2025-12-04T10:35:20.1164207Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1164212Z 
2025-12-04T10:35:20.1164822Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1164827Z 
2025-12-04T10:35:20.1164834Z 
2025-12-04T10:35:20.1165012Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1165603Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1165610Z 
2025-12-04T10:35:20.1165832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1166016Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1166113Z frames [('total', 1)]
2025-12-04T10:35:20.1166209Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1166685Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1166877Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1166960Z graph_break []
2025-12-04T10:35:20.1167117Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1167298Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1167382Z frames [('total', 1)]
2025-12-04T10:35:20.1167483Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1167671Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1168143Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1168221Z graph_break []
2025-12-04T10:35:20.1168451Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1168582Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1168821Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1168921Z Traceback (most recent call last):
2025-12-04T10:35:20.1169264Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1169390Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1169817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1170033Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1170473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1170654Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1171087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1171207Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1171683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1172004Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1172460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1172625Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1173042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1173156Z     return self._compile_to_module()
2025-12-04T10:35:20.1173574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1173726Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1174166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1174274Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1174719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1174917Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1175419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1175536Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1175979Z   File "/tmp/tmpcbu5hy36/qo/cqoehnxgjew6n6n6bk3nvdevhbwxdxvtykfp2p7hz6f2cyn4sbzv.py", line 60, in <module>
2025-12-04T10:35:20.1176387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1176477Z     kernel.precompile(
2025-12-04T10:35:20.1176947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1177058Z     self._precompile_worker()
2025-12-04T10:35:20.1177566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1177731Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1178247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1178418Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1178903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1179172Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1179547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1179837Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1180033Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1180469Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1180541Z ^
2025-12-04T10:35:20.1180928Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1180933Z 
2025-12-04T10:35:20.1181557Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1181561Z 
2025-12-04T10:35:20.1181565Z 
2025-12-04T10:35:20.1181749Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1182348Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1182397Z 
2025-12-04T10:35:20.1182627Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1182811Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1182936Z frames [('total', 1)]
2025-12-04T10:35:20.1183034Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1183500Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1183699Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1183782Z graph_break []
2025-12-04T10:35:20.1183939Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1184119Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1184213Z frames [('total', 1)]
2025-12-04T10:35:20.1184313Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1184497Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1184957Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1185038Z graph_break []
2025-12-04T10:35:20.1185182Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1185368Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1185450Z frames [('total', 1)]
2025-12-04T10:35:20.1185548Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1185739Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1186188Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1186276Z graph_break []
2025-12-04T10:35:20.1186420Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1186980Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml -
2025-12-04T10:35:20.1187124Z =========================== short test summary info ============================
2025-12-04T10:35:20.1187716Z FAILED [0.4277s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1188250Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1188325Z ^
2025-12-04T10:35:20.1188712Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1188717Z 
2025-12-04T10:35:20.1189335Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1189344Z 
2025-12-04T10:35:20.1189348Z 
2025-12-04T10:35:20.1189524Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1190117Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1190124Z 
2025-12-04T10:35:20.1190345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1190501Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1190671Z ================== 1 failed, 22 deselected, 2 rerun in 3.06s ===================
2025-12-04T10:35:20.1190751Z Got exit code 1
2025-12-04T10:35:20.1190844Z Retrying single test...
2025-12-04T10:35:20.1191248Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml
2025-12-04T10:35:20.1191383Z ============================= test session starts ==============================
2025-12-04T10:35:20.1191725Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1191816Z cachedir: .pytest_cache
2025-12-04T10:35:20.1192262Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1192422Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1192508Z configfile: pytest.ini
2025-12-04T10:35:20.1192973Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1193156Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.1193669Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1193765Z Running 1 items in this shard
2025-12-04T10:35:20.1193771Z 
2025-12-04T10:35:20.1194558Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:05.371903031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1194565Z 
2025-12-04T10:35:20.1195013Z [W1204 10:22:14.821799036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1195018Z 
2025-12-04T10:35:20.1195489Z [W1204 10:22:14.822031080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1195495Z 
2025-12-04T10:35:20.1195944Z [W1204 10:22:14.824322815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1195949Z 
2025-12-04T10:35:20.1196377Z [W1204 10:22:14.824508569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1196384Z 
2025-12-04T10:35:20.1196812Z [W1204 10:22:14.826608730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1196819Z 
2025-12-04T10:35:20.1197246Z [W1204 10:22:14.826890605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1197250Z 
2025-12-04T10:35:20.1197758Z [W1204 10:22:14.827053789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1197763Z 
2025-12-04T10:35:20.1198201Z [W1204 10:22:14.827462137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1198205Z 
2025-12-04T10:35:20.1198635Z [W1204 10:22:14.827634680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1198641Z 
2025-12-04T10:35:20.1199073Z [W1204 10:22:14.828094209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1199078Z 
2025-12-04T10:35:20.1199504Z [W1204 10:22:14.828265022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1199511Z 
2025-12-04T10:35:20.1199940Z [W1204 10:22:14.828612259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1199948Z 
2025-12-04T10:35:20.1200377Z [W1204 10:22:14.828789143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1200382Z 
2025-12-04T10:35:20.1200811Z [W1204 10:22:14.829109049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1200815Z 
2025-12-04T10:35:20.1201285Z [W1204 10:22:14.829278062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1201289Z 
2025-12-04T10:35:20.1201717Z [W1204 10:22:14.829598308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1201767Z 
2025-12-04T10:35:20.1202200Z [W1204 10:22:14.829764192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1202205Z 
2025-12-04T10:35:20.1202321Z ('RERUN', {'yellow': True}) [11.6561s] [100%]
2025-12-04T10:35:20.1203108Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:15.961341030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1203114Z 
2025-12-04T10:35:20.1203553Z [W1204 10:22:15.961694337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1203560Z 
2025-12-04T10:35:20.1203988Z [W1204 10:22:15.961859811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1203995Z 
2025-12-04T10:35:20.1204422Z [W1204 10:22:15.962320090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1204427Z 
2025-12-04T10:35:20.1204859Z [W1204 10:22:15.962496853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1204863Z 
2025-12-04T10:35:20.1205293Z [W1204 10:22:15.962815409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1205297Z 
2025-12-04T10:35:20.1205747Z [W1204 10:22:15.963062644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1205763Z 
2025-12-04T10:35:20.1206221Z [W1204 10:22:15.963232837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1206227Z 
2025-12-04T10:35:20.1206655Z [W1204 10:22:15.963602535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1206659Z 
2025-12-04T10:35:20.1207254Z [W1204 10:22:15.963767098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1207259Z 
2025-12-04T10:35:20.1207687Z [W1204 10:22:15.964136065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1207691Z 
2025-12-04T10:35:20.1208264Z [W1204 10:22:15.964302658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1208271Z 
2025-12-04T10:35:20.1208779Z [W1204 10:22:15.964616875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1208790Z 
2025-12-04T10:35:20.1209374Z [W1204 10:22:15.964786198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1209380Z 
2025-12-04T10:35:20.1209849Z [W1204 10:22:15.965086194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1209854Z 
2025-12-04T10:35:20.1210292Z [W1204 10:22:15.965249777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1210296Z 
2025-12-04T10:35:20.1210723Z [W1204 10:22:15.965556343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1210814Z 
2025-12-04T10:35:20.1211243Z [W1204 10:22:15.965722326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1211252Z 
2025-12-04T10:35:20.1211415Z ('RERUN', {'yellow': True}) [0.6952s] [100%]
2025-12-04T10:35:20.1212201Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:16.662014485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1212206Z 
2025-12-04T10:35:20.1212645Z [W1204 10:22:16.662367672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1212649Z 
2025-12-04T10:35:20.1213078Z [W1204 10:22:16.662538115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1213082Z 
2025-12-04T10:35:20.1213515Z [W1204 10:22:16.663007385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1213522Z 
2025-12-04T10:35:20.1213950Z [W1204 10:22:16.663188988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1213956Z 
2025-12-04T10:35:20.1214387Z [W1204 10:22:16.663490984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1214391Z 
2025-12-04T10:35:20.1214826Z [W1204 10:22:16.663739339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1214830Z 
2025-12-04T10:35:20.1215259Z [W1204 10:22:16.663904892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1215271Z 
2025-12-04T10:35:20.1215720Z [W1204 10:22:16.664288490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1215730Z 
2025-12-04T10:35:20.1216180Z [W1204 10:22:16.664455753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1216186Z 
2025-12-04T10:35:20.1216616Z [W1204 10:22:16.664829230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1216620Z 
2025-12-04T10:35:20.1217156Z [W1204 10:22:16.664996894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1217162Z 
2025-12-04T10:35:20.1217593Z [W1204 10:22:16.665318160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1217597Z 
2025-12-04T10:35:20.1218028Z [W1204 10:22:16.665495413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1218035Z 
2025-12-04T10:35:20.1218467Z [W1204 10:22:16.666068545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1218474Z 
2025-12-04T10:35:20.1218900Z [W1204 10:22:16.666240498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1218904Z 
2025-12-04T10:35:20.1219403Z [W1204 10:22:16.667764378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1219408Z 
2025-12-04T10:35:20.1219837Z [W1204 10:22:16.667941201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1219842Z 
2025-12-04T10:35:20.1220264Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] Failed to remove temporary cache dir at /tmp/tmpnm6gjk78
2025-12-04T10:35:20.1220651Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] Traceback (most recent call last):
2025-12-04T10:35:20.1221185Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354]   File "/opt/conda/envs/py_3.10/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd
2025-12-04T10:35:20.1221603Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354]     os.rmdir(entry.name, dir_fd=topfd)
2025-12-04T10:35:20.1222288Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] OSError: [Errno 39] Directory not empty: 'D7AGGDEZNIS5BGFNPDIKXLYBRZFN3WBF3ZTPZHJPZOI2QCCFJSMA'
2025-12-04T10:35:20.1222839Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Failed to remove temporary cache dir at /tmp/tmpnm6gjk78
2025-12-04T10:35:20.1223283Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Traceback (most recent call last):
2025-12-04T10:35:20.1223861Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354]   File "/opt/conda/envs/py_3.10/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd
2025-12-04T10:35:20.1224217Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354]     os.rmdir(entry.name, dir_fd=topfd)
2025-12-04T10:35:20.1224607Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] OSError: [Errno 39] Directory not empty: 'triton'
2025-12-04T10:35:20.1225025Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Failed to remove temporary cache dir at /tmp/tmpnm6gjk78
2025-12-04T10:35:20.1225369Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Traceback (most recent call last):
2025-12-04T10:35:20.1225905Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354]   File "/opt/conda/envs/py_3.10/lib/python3.10/shutil.py", line 729, in rmtree
2025-12-04T10:35:20.1226175Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354]     os.rmdir(path)
2025-12-04T10:35:20.1226603Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] OSError: [Errno 39] Directory not empty: '/tmp/tmpnm6gjk78'
2025-12-04T10:35:20.1226684Z FAILED [0.7106s] [100%]
2025-12-04T10:35:20.1226695Z 
2025-12-04T10:35:20.1226811Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1227049Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1227153Z Traceback (most recent call last):
2025-12-04T10:35:20.1227583Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1227703Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1228121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1228338Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1228779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1228937Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1229369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1229488Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1229944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1230215Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1230658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1230778Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1231190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1231334Z     return self._compile_to_module()
2025-12-04T10:35:20.1231742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1231927Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1232363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1232480Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1232898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1233088Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1233594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1233699Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1234140Z   File "/tmp/tmph0jzyhxz/bd/cbdw6ykoryqwp5jpfuohx52saca4vpiha2kaxsik7mkvlyuo2clb.py", line 193, in <module>
2025-12-04T10:35:20.1234526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1234622Z     self._wait_futures(scope)
2025-12-04T10:35:20.1235044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1235143Z     kernel = result.result()
2025-12-04T10:35:20.1235513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1235610Z     return self.result_fn()
2025-12-04T10:35:20.1236011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1236119Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1236552Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1236557Z 
2025-12-04T10:35:20.1236728Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1236836Z Traceback (most recent call last):
2025-12-04T10:35:20.1237291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1237369Z     result = job()
2025-12-04T10:35:20.1237959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1238073Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1238549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1238643Z     self._precompile_worker()
2025-12-04T10:35:20.1239152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1239301Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1239811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1239981Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1240362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1240566Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1240939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1241219Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1241416Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1241836Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1241903Z ^
2025-12-04T10:35:20.1242335Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1242340Z 
2025-12-04T10:35:20.1242344Z 
2025-12-04T10:35:20.1242951Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1242957Z 
2025-12-04T10:35:20.1242961Z 
2025-12-04T10:35:20.1243143Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1243727Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1243735Z 
2025-12-04T10:35:20.1243953Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1244137Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1244218Z frames [('total', 1)]
2025-12-04T10:35:20.1244311Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1244877Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1245067Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1245146Z graph_break []
2025-12-04T10:35:20.1245290Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1245489Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1246559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1246656Z   if out == self.unknown_value:
2025-12-04T10:35:20.1246896Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1246994Z Traceback (most recent call last):
2025-12-04T10:35:20.1247324Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1247443Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1247937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1248156Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1248586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1248744Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1249179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1249301Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1249751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1250027Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1250467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1250588Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1250990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1251086Z     return self._compile_to_module()
2025-12-04T10:35:20.1251539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1251672Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1252151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1252255Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1252676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1252869Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1253363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1253463Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1253896Z   File "/tmp/tmp9zfsqhq3/e3/ce3pmk2zq62c6ibr5psczwpukuth2alrinz5z45k6dfg4uy46ltw.py", line 193, in <module>
2025-12-04T10:35:20.1254284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1254388Z     self._wait_futures(scope)
2025-12-04T10:35:20.1254808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1254900Z     kernel = result.result()
2025-12-04T10:35:20.1255283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1255372Z     return self.result_fn()
2025-12-04T10:35:20.1255774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1255885Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1256211Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1256218Z 
2025-12-04T10:35:20.1256391Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1256491Z Traceback (most recent call last):
2025-12-04T10:35:20.1256949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1257029Z     result = job()
2025-12-04T10:35:20.1257528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1257731Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1258198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1258291Z     self._precompile_worker()
2025-12-04T10:35:20.1258804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1258955Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1259551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1259735Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1260140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1260365Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1260766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1261066Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1261234Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1261681Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1261828Z ^
2025-12-04T10:35:20.1262213Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1262257Z 
2025-12-04T10:35:20.1262262Z 
2025-12-04T10:35:20.1262865Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1262874Z 
2025-12-04T10:35:20.1262878Z 
2025-12-04T10:35:20.1263062Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1263649Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1263654Z 
2025-12-04T10:35:20.1263878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1264056Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1264148Z frames [('total', 1)]
2025-12-04T10:35:20.1264240Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1264801Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1264996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1265077Z graph_break []
2025-12-04T10:35:20.1265226Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1265404Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1266447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1266548Z   if out == self.unknown_value:
2025-12-04T10:35:20.1266722Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1266802Z frames [('total', 1)]
2025-12-04T10:35:20.1266896Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1267081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1267638Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1267823Z graph_break []
2025-12-04T10:35:20.1267969Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1268094Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1268328Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1268429Z Traceback (most recent call last):
2025-12-04T10:35:20.1268765Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1268879Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1269291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1269504Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1269936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1270100Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1270531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1270648Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1271101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1271415Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1271858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1272019Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1272422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1272522Z     return self._compile_to_module()
2025-12-04T10:35:20.1272933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1273065Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1273505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1273608Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1274034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1274228Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1274727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1274841Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1275279Z   File "/tmp/tmpnm6gjk78/vl/cvlsw3ncyp3l7ltyeutz2mire53d3lqdt5pm63pej6dotq6ssgnm.py", line 193, in <module>
2025-12-04T10:35:20.1275662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1275753Z     self._wait_futures(scope)
2025-12-04T10:35:20.1276168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1276266Z     kernel = result.result()
2025-12-04T10:35:20.1276636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1276724Z     return self.result_fn()
2025-12-04T10:35:20.1277135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1277238Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1277563Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1277657Z 
2025-12-04T10:35:20.1277832Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1277927Z Traceback (most recent call last):
2025-12-04T10:35:20.1278385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1278460Z     result = job()
2025-12-04T10:35:20.1278961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1279078Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1279544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1279644Z     self._precompile_worker()
2025-12-04T10:35:20.1280145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1280294Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1280798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1280960Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1281342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1281593Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1281963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1282285Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1282442Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1282865Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1282934Z ^
2025-12-04T10:35:20.1283318Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1283323Z 
2025-12-04T10:35:20.1283327Z 
2025-12-04T10:35:20.1283933Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1283941Z 
2025-12-04T10:35:20.1283945Z 
2025-12-04T10:35:20.1284121Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1284714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1284718Z 
2025-12-04T10:35:20.1284940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1285123Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1285211Z frames [('total', 1)]
2025-12-04T10:35:20.1285303Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1285869Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1286052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1286130Z graph_break []
2025-12-04T10:35:20.1286275Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1286446Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1287486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1287662Z   if out == self.unknown_value:
2025-12-04T10:35:20.1287842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1287929Z frames [('total', 1)]
2025-12-04T10:35:20.1288023Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1288206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1288767Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1288845Z graph_break []
2025-12-04T10:35:20.1288995Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1289172Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1289249Z frames [('total', 1)]
2025-12-04T10:35:20.1289344Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1289527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1290085Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1290172Z graph_break []
2025-12-04T10:35:20.1290312Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1290871Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml -
2025-12-04T10:35:20.1291057Z =========================== short test summary info ============================
2025-12-04T10:35:20.1291792Z FAILED [0.7106s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1291837Z 
2025-12-04T10:35:20.1292010Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1292114Z Traceback (most recent call last):
2025-12-04T10:35:20.1292582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1292661Z     result = job()
2025-12-04T10:35:20.1293165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1293283Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1293761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1293855Z     self._precompile_worker()
2025-12-04T10:35:20.1294363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1294507Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1295017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1295179Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1295578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1295817Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1296190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1296475Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1296629Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1297043Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1297118Z ^
2025-12-04T10:35:20.1297611Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1297616Z 
2025-12-04T10:35:20.1297620Z 
2025-12-04T10:35:20.1298226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1298230Z 
2025-12-04T10:35:20.1298234Z 
2025-12-04T10:35:20.1298414Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1298996Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1299010Z 
2025-12-04T10:35:20.1299300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1299448Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1299629Z ================= 1 failed, 187 deselected, 2 rerun in 13.10s ==================
2025-12-04T10:35:20.1299710Z Got exit code 1
2025-12-04T10:35:20.1299793Z Retrying single test...
2025-12-04T10:35:20.1300193Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml
2025-12-04T10:35:20.1300324Z ============================= test session starts ==============================
2025-12-04T10:35:20.1300618Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1300749Z cachedir: .pytest_cache
2025-12-04T10:35:20.1301197Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1301341Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1301426Z configfile: pytest.ini
2025-12-04T10:35:20.1301883Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1302074Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.1302588Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1302687Z Running 1 items in this shard
2025-12-04T10:35:20.1302691Z 
2025-12-04T10:35:20.1303482Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:25.871446482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1303490Z 
2025-12-04T10:35:20.1303922Z [W1204 10:22:35.228605735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1303934Z 
2025-12-04T10:35:20.1304366Z [W1204 10:22:35.228842229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1304370Z 
2025-12-04T10:35:20.1304803Z [W1204 10:22:35.231152875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1304808Z 
2025-12-04T10:35:20.1305240Z [W1204 10:22:35.231348598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1305244Z 
2025-12-04T10:35:20.1305696Z [W1204 10:22:35.233470860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1305703Z 
2025-12-04T10:35:20.1306167Z [W1204 10:22:35.233758856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1306174Z 
2025-12-04T10:35:20.1306599Z [W1204 10:22:35.233925249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1306603Z 
2025-12-04T10:35:20.1307111Z [W1204 10:22:35.234331347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1307115Z 
2025-12-04T10:35:20.1307542Z [W1204 10:22:35.234506480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1307547Z 
2025-12-04T10:35:20.1312697Z [W1204 10:22:35.234967919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1312711Z 
2025-12-04T10:35:20.1313171Z [W1204 10:22:35.235146043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1313177Z 
2025-12-04T10:35:20.1313615Z [W1204 10:22:35.235497820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1313625Z 
2025-12-04T10:35:20.1314066Z [W1204 10:22:35.235664433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1314071Z 
2025-12-04T10:35:20.1314503Z [W1204 10:22:35.235975139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1314508Z 
2025-12-04T10:35:20.1314945Z [W1204 10:22:35.236141092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1315051Z 
2025-12-04T10:35:20.1315484Z [W1204 10:22:35.236459698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1315554Z 
2025-12-04T10:35:20.1315997Z [W1204 10:22:35.236630952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1316001Z 
2025-12-04T10:35:20.1316117Z ('RERUN', {'yellow': True}) [11.5590s] [100%]
2025-12-04T10:35:20.1316923Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:36.367496585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1316928Z 
2025-12-04T10:35:20.1317359Z [W1204 10:22:36.367839272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1317367Z 
2025-12-04T10:35:20.1317796Z [W1204 10:22:36.368005475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1317810Z 
2025-12-04T10:35:20.1318241Z [W1204 10:22:36.368473554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1318247Z 
2025-12-04T10:35:20.1318677Z [W1204 10:22:36.368653488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1318685Z 
2025-12-04T10:35:20.1319120Z [W1204 10:22:36.368947363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1319125Z 
2025-12-04T10:35:20.1319555Z [W1204 10:22:36.369187588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1319560Z 
2025-12-04T10:35:20.1319993Z [W1204 10:22:36.369350051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1319997Z 
2025-12-04T10:35:20.1320430Z [W1204 10:22:36.369718498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1320436Z 
2025-12-04T10:35:20.1320871Z [W1204 10:22:36.369886042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1320876Z 
2025-12-04T10:35:20.1321409Z [W1204 10:22:36.370293530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1321414Z 
2025-12-04T10:35:20.1321855Z [W1204 10:22:36.370466563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1321860Z 
2025-12-04T10:35:20.1322289Z [W1204 10:22:36.370789689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1322296Z 
2025-12-04T10:35:20.1322731Z [W1204 10:22:36.370955243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1322737Z 
2025-12-04T10:35:20.1323175Z [W1204 10:22:36.371268269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1323179Z 
2025-12-04T10:35:20.1323609Z [W1204 10:22:36.371433512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1323614Z 
2025-12-04T10:35:20.1324047Z [W1204 10:22:36.371742558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1324052Z 
2025-12-04T10:35:20.1324483Z [W1204 10:22:36.371906691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1324528Z 
2025-12-04T10:35:20.1324643Z ('RERUN', {'yellow': True}) [0.6935s] [100%]
2025-12-04T10:35:20.1325442Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:37.062598464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1325495Z 
2025-12-04T10:35:20.1325967Z [W1204 10:22:37.062947151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1325971Z 
2025-12-04T10:35:20.1326501Z [W1204 10:22:37.063114814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1326506Z 
2025-12-04T10:35:20.1326935Z [W1204 10:22:37.063590864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1326950Z 
2025-12-04T10:35:20.1327377Z [W1204 10:22:37.063764207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1327387Z 
2025-12-04T10:35:20.1327811Z [W1204 10:22:37.064056153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1327815Z 
2025-12-04T10:35:20.1328256Z [W1204 10:22:37.064297917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1328261Z 
2025-12-04T10:35:20.1328690Z [W1204 10:22:37.064458880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1328694Z 
2025-12-04T10:35:20.1329131Z [W1204 10:22:37.064820248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1329136Z 
2025-12-04T10:35:20.1329566Z [W1204 10:22:37.064988661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1329570Z 
2025-12-04T10:35:20.1330009Z [W1204 10:22:37.065360208 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1330013Z 
2025-12-04T10:35:20.1330441Z [W1204 10:22:37.065530101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1330528Z 
2025-12-04T10:35:20.1330959Z [W1204 10:22:37.065848088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1330968Z 
2025-12-04T10:35:20.1331398Z [W1204 10:22:37.066014671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1331405Z 
2025-12-04T10:35:20.1331831Z [W1204 10:22:37.066324457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1331835Z 
2025-12-04T10:35:20.1332267Z [W1204 10:22:37.066491010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1332273Z 
2025-12-04T10:35:20.1332701Z [W1204 10:22:37.066797916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1332713Z 
2025-12-04T10:35:20.1333144Z [W1204 10:22:37.066964219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:35:20.1333148Z 
2025-12-04T10:35:20.1333232Z FAILED [0.7054s] [100%]
2025-12-04T10:35:20.1333236Z 
2025-12-04T10:35:20.1333363Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1333644Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1333750Z Traceback (most recent call last):
2025-12-04T10:35:20.1334091Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1334251Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1334673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1334893Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1335338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1335507Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1335940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1336061Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1336518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1336788Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1337234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1337355Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1337764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1337870Z     return self._compile_to_module()
2025-12-04T10:35:20.1338283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1338417Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1338858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1338960Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1339441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1339638Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1340244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1340356Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1340792Z   File "/tmp/tmpmamsjdzh/qd/cqd2dgft4negc555emrg7ptbhsxbirtnuwo2gvd4cvb77fi6j57d.py", line 193, in <module>
2025-12-04T10:35:20.1341179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1341271Z     self._wait_futures(scope)
2025-12-04T10:35:20.1341691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1341792Z     kernel = result.result()
2025-12-04T10:35:20.1342168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1342261Z     return self.result_fn()
2025-12-04T10:35:20.1342669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1342778Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1343110Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1343115Z 
2025-12-04T10:35:20.1343287Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1343389Z Traceback (most recent call last):
2025-12-04T10:35:20.1343855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1343984Z     result = job()
2025-12-04T10:35:20.1344492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1344647Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1345117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1345217Z     self._precompile_worker()
2025-12-04T10:35:20.1345768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1345924Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1346434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1346601Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1346988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1347193Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1347573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1347857Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1348021Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1348448Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1348519Z ^
2025-12-04T10:35:20.1348911Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1348918Z 
2025-12-04T10:35:20.1348922Z 
2025-12-04T10:35:20.1349539Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1349547Z 
2025-12-04T10:35:20.1349550Z 
2025-12-04T10:35:20.1349736Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1350332Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1350421Z 
2025-12-04T10:35:20.1350646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1350825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1350920Z frames [('total', 1)]
2025-12-04T10:35:20.1351012Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1351580Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1351774Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1351855Z graph_break []
2025-12-04T10:35:20.1352006Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1352188Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1353240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1353337Z   if out == self.unknown_value:
2025-12-04T10:35:20.1353579Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1353691Z Traceback (most recent call last):
2025-12-04T10:35:20.1354072Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1354188Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1354601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1354849Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1355293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1355469Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1355945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1356072Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1356530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1356813Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1357251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1357378Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1357792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1357891Z     return self._compile_to_module()
2025-12-04T10:35:20.1358302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1358441Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1358878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1358992Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1359412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1359603Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1360108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1360216Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1360735Z   File "/tmp/tmpfustvwn0/74/c74sqjwxqkvttc5lr25tjtrfqrqrmohcrz2aeb3bsqq3sv4tobqm.py", line 193, in <module>
2025-12-04T10:35:20.1361117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1361211Z     self._wait_futures(scope)
2025-12-04T10:35:20.1361632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1361732Z     kernel = result.result()
2025-12-04T10:35:20.1362102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1362195Z     return self.result_fn()
2025-12-04T10:35:20.1362600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1362710Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1363037Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1363049Z 
2025-12-04T10:35:20.1363220Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1363322Z Traceback (most recent call last):
2025-12-04T10:35:20.1363779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1363861Z     result = job()
2025-12-04T10:35:20.1364360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1364521Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1365003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1365139Z     self._precompile_worker()
2025-12-04T10:35:20.1365676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1365850Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1366352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1366519Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1366896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1367101Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1367478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1367762Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1367922Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1368340Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1368406Z ^
2025-12-04T10:35:20.1368795Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1368800Z 
2025-12-04T10:35:20.1368804Z 
2025-12-04T10:35:20.1369406Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1369413Z 
2025-12-04T10:35:20.1369417Z 
2025-12-04T10:35:20.1369601Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1370194Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1370198Z 
2025-12-04T10:35:20.1370422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1370769Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1370853Z frames [('total', 1)]
2025-12-04T10:35:20.1370954Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1371517Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1371712Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1371795Z graph_break []
2025-12-04T10:35:20.1371938Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1372112Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1373162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1373264Z   if out == self.unknown_value:
2025-12-04T10:35:20.1373449Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1373534Z frames [('total', 1)]
2025-12-04T10:35:20.1373631Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1373823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1374390Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1374518Z graph_break []
2025-12-04T10:35:20.1374669Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1374852Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1375098Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T10:35:20.1375203Z Traceback (most recent call last):
2025-12-04T10:35:20.1375543Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback
2025-12-04T10:35:20.1375687Z     y_fp8 = compiled_fp8_matmul(x)  # noqa: F841
2025-12-04T10:35:20.1376129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1376346Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1376781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1376943Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1377378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1377500Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1377959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1378234Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1378678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1378804Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1379279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1379378Z     return self._compile_to_module()
2025-12-04T10:35:20.1379800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1379935Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1380380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1380571Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1380992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1381188Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1381684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1381793Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1382227Z   File "/tmp/tmpv0eg6o2x/jh/cjhw5ltfklcgqyvrv5j2bokd42ir36gfz77m6gmqk6yep6vcej2y.py", line 193, in <module>
2025-12-04T10:35:20.1382612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.1382710Z     self._wait_futures(scope)
2025-12-04T10:35:20.1383130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.1383220Z     kernel = result.result()
2025-12-04T10:35:20.1383594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.1383683Z     return self.result_fn()
2025-12-04T10:35:20.1384094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.1384247Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.1384573Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1384578Z 
2025-12-04T10:35:20.1384758Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1384894Z Traceback (most recent call last):
2025-12-04T10:35:20.1385358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1385455Z     result = job()
2025-12-04T10:35:20.1386003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1386124Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1386594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1386686Z     self._precompile_worker()
2025-12-04T10:35:20.1387202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1387346Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1387863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1388026Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1388412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1388630Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1389001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1389282Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1389448Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1389865Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1389944Z ^
2025-12-04T10:35:20.1390331Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1390336Z 
2025-12-04T10:35:20.1390340Z 
2025-12-04T10:35:20.1391025Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1391037Z 
2025-12-04T10:35:20.1391041Z 
2025-12-04T10:35:20.1391220Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1391810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1391818Z 
2025-12-04T10:35:20.1392048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1392229Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1392318Z frames [('total', 1)]
2025-12-04T10:35:20.1392412Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1392975Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1393174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1393252Z graph_break []
2025-12-04T10:35:20.1393398Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1393576Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:35:20.1394617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:35:20.1394761Z   if out == self.unknown_value:
2025-12-04T10:35:20.1394973Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1395051Z frames [('total', 1)]
2025-12-04T10:35:20.1395151Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1395334Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1395956Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1396033Z graph_break []
2025-12-04T10:35:20.1396175Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1396360Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1396446Z frames [('total', 1)]
2025-12-04T10:35:20.1396540Z stats [('calls_captured', 11)]
2025-12-04T10:35:20.1396730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1397286Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.1397372Z graph_break []
2025-12-04T10:35:20.1397522Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)]
2025-12-04T10:35:20.1398080Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml -
2025-12-04T10:35:20.1398226Z =========================== short test summary info ============================
2025-12-04T10:35:20.1398964Z FAILED [0.7054s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.1398972Z 
2025-12-04T10:35:20.1399145Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0
2025-12-04T10:35:20.1399245Z Traceback (most recent call last):
2025-12-04T10:35:20.1399715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.1399801Z     result = job()
2025-12-04T10:35:20.1400309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.1400508Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.1400990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.1401087Z     self._precompile_worker()
2025-12-04T10:35:20.1401602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1401754Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1402256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1402438Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1402823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1403038Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1403417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1403700Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1403859Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1404275Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1404387Z ^
2025-12-04T10:35:20.1404783Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1404826Z 
2025-12-04T10:35:20.1404830Z 
2025-12-04T10:35:20.1405434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1405439Z 
2025-12-04T10:35:20.1405442Z 
2025-12-04T10:35:20.1405640Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1406227Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1406232Z 
2025-12-04T10:35:20.1406462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1406616Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1406788Z ================= 1 failed, 187 deselected, 2 rerun in 12.99s ==================
2025-12-04T10:35:20.1406874Z Got exit code 1
2025-12-04T10:35:20.1407260Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T10:35:20.1407615Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.1408176Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml
2025-12-04T10:35:20.1408316Z ============================= test session starts ==============================
2025-12-04T10:35:20.1408614Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1408705Z cachedir: .pytest_cache
2025-12-04T10:35:20.1409150Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1409267Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1409358Z configfile: pytest.ini
2025-12-04T10:35:20.1409819Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1410012Z collecting ... collected 188 items / 23 deselected / 165 selected
2025-12-04T10:35:20.1410132Z stepcurrent: skipping 23 already run items.
2025-12-04T10:35:20.1410353Z Running 165 items in this shard
2025-12-04T10:35:20.1410358Z 
2025-12-04T10:35:20.1411147Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  0%]
2025-12-04T10:35:20.1411928Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  1%]
2025-12-04T10:35:20.1412695Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  1%]
2025-12-04T10:35:20.1413456Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  2%]
2025-12-04T10:35:20.1414688Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1415656Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1416167Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1416535Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1416983Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1417368Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1417819Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1418281Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1418778Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1419333Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1419808Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1420178Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1420615Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1421011Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1421402Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1421777Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1422326Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1422856Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1423325Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1423760Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1424256Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1424712Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1425199Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1425698Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1426180Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1426628Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1427098Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1427548Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1427949Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1428365Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1428863Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1429318Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1429804Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1430208Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1430586Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1431002Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1431374Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1431784Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1432232Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1432648Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1433074Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1433582Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1434147Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1434692Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1435093Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1435466Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1435950Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1436315Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1436809Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1437259Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1437696Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1438337Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1438976Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1439277Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1441077Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1441534Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1442429Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1442963Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1443716Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1444290Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1445038Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1445811Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1446343Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1447272Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1447588Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1448350Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1448461Z ('RERUN', {'yellow': True}) [1.7564s] [  3%]
2025-12-04T10:35:20.1449695Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1450616Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1451020Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1451424Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1451865Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1452254Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1452704Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1453163Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1453655Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1454159Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1454635Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1455000Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1455441Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1455883Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1456270Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1456646Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1457189Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1457739Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1458199Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1458626Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1459165Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1459616Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1460104Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1460556Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1461034Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1461485Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1461961Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1462369Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1462808Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1463217Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1463712Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1464164Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1464650Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1465048Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1465418Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1465840Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1466210Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1466608Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1467051Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1467454Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1467877Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1468374Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1468935Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1469469Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1469872Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1470245Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1470728Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1471093Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1471576Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1472021Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1472449Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1473093Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1473685Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1474026Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1475861Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1476320Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1477209Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1477742Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1478491Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1479064Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1479809Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1480534Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1481058Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1481981Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1482286Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1483043Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1483148Z ('RERUN', {'yellow': True}) [0.3115s] [  3%]
2025-12-04T10:35:20.1484375Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1485297Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1485703Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1486144Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1486600Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1486984Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1487433Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1487886Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1488376Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1488872Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1489341Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1489705Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1490144Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1490537Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1490924Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1491294Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1491835Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1492361Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1492821Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1493246Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1493734Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1494180Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1494671Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1495123Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1495603Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1496049Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1496521Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1496926Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1497365Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1497772Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1498265Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1498720Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1499271Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1499667Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1500034Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1500445Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1500812Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1501210Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1501652Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1502055Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1502478Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1502973Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1503563Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1504098Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1504500Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1504870Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1505353Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1505768Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1506249Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1506697Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1507127Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1508035Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1508630Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1508998Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1510779Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1511236Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1512120Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1512651Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1513406Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1513978Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1514727Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1515492Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1516050Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1516973Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1517276Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1518039Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1518120Z FAILED [0.3105s] [  3%]
2025-12-04T10:35:20.1518125Z 
2025-12-04T10:35:20.1518250Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1518582Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1518680Z Traceback (most recent call last):
2025-12-04T10:35:20.1519036Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1519230Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1519705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1519910Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1520382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1520546Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1520978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1521097Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1521549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1521819Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1522264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1522384Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1522788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1522888Z     return self._compile_to_module()
2025-12-04T10:35:20.1523298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1523435Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1523868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1523973Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1524392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1524584Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1525082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1525186Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1525616Z   File "/tmp/tmpcnc9szkk/wj/cwjkbv2lw3skfclmw777nmxunwaxbevv7qru57jg4as3bbpjft7k.py", line 74, in <module>
2025-12-04T10:35:20.1526088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1526177Z     kernel.precompile(
2025-12-04T10:35:20.1526644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1526743Z     self._precompile_worker()
2025-12-04T10:35:20.1527246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1527398Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1527901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1528067Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1528446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1528655Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1529027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1529307Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1529496Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1530094Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1530202Z ^
2025-12-04T10:35:20.1530591Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1530596Z 
2025-12-04T10:35:20.1531207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1531212Z 
2025-12-04T10:35:20.1531216Z 
2025-12-04T10:35:20.1531393Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1532135Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1532143Z 
2025-12-04T10:35:20.1532363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1532543Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1532626Z frames [('total', 1)]
2025-12-04T10:35:20.1532719Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1533122Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1533307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1533388Z graph_break []
2025-12-04T10:35:20.1533715Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1533814Z Traceback (most recent call last):
2025-12-04T10:35:20.1534171Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1534362Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1534775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1534986Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1535422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1535583Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1536144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1536263Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1536720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1536988Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1537431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1537550Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1537954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1538054Z     return self._compile_to_module()
2025-12-04T10:35:20.1538464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1538597Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1539095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1539197Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1539620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1539859Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1540353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1540600Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1541027Z   File "/tmp/tmpam99xy4e/4s/c4s34qh73ne2mlkpkfnfdlwirckkgehnjms2cbxbkqelul36m5wz.py", line 74, in <module>
2025-12-04T10:35:20.1541424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1541510Z     kernel.precompile(
2025-12-04T10:35:20.1541980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1542076Z     self._precompile_worker()
2025-12-04T10:35:20.1542578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1542726Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1543231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1543396Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1543777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1543982Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1544351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1544636Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1544824Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1545383Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1545458Z ^
2025-12-04T10:35:20.1545886Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1545892Z 
2025-12-04T10:35:20.1546585Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1546591Z 
2025-12-04T10:35:20.1546595Z 
2025-12-04T10:35:20.1546772Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1547507Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1547514Z 
2025-12-04T10:35:20.1547734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1547910Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1547995Z frames [('total', 1)]
2025-12-04T10:35:20.1548087Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1548488Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1548673Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1548748Z graph_break []
2025-12-04T10:35:20.1548926Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1549003Z frames [('total', 1)]
2025-12-04T10:35:20.1549094Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1549276Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1549666Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1549793Z graph_break []
2025-12-04T10:35:20.1549917Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1550372Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1550520Z Traceback (most recent call last):
2025-12-04T10:35:20.1550983Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1551247Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1551742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1551952Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1552390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1552554Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1552985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1553110Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1553557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1553834Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1554273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1554389Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1554793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1554892Z     return self._compile_to_module()
2025-12-04T10:35:20.1555298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1555437Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1555874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1555980Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1556492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1556686Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1557187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1557288Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1557721Z   File "/tmp/tmppr6xkt76/7c/c7cvhkxncxlxgqtbu2pkgagh6dfdr57u2aj6y5gtjkz723m3hp2g.py", line 74, in <module>
2025-12-04T10:35:20.1558108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1558196Z     kernel.precompile(
2025-12-04T10:35:20.1558666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1558757Z     self._precompile_worker()
2025-12-04T10:35:20.1559266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1559415Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1559917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1560126Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1560501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1560702Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1561118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1561398Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1561600Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1562150Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1562218Z ^
2025-12-04T10:35:20.1562610Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1562620Z 
2025-12-04T10:35:20.1563222Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1563229Z 
2025-12-04T10:35:20.1563233Z 
2025-12-04T10:35:20.1563413Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1564148Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1564153Z 
2025-12-04T10:35:20.1564471Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1564650Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1564729Z frames [('total', 1)]
2025-12-04T10:35:20.1564826Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1565225Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1565433Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1565525Z graph_break []
2025-12-04T10:35:20.1565720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1565801Z frames [('total', 1)]
2025-12-04T10:35:20.1565903Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1566082Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1566564Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1566641Z graph_break []
2025-12-04T10:35:20.1566818Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1566905Z frames [('total', 1)]
2025-12-04T10:35:20.1566994Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1567178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1567570Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1567648Z graph_break []
2025-12-04T10:35:20.1568207Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml -
2025-12-04T10:35:20.1568350Z =========================== short test summary info ============================
2025-12-04T10:35:20.1569067Z FAILED [0.3105s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1569619Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1569729Z ^
2025-12-04T10:35:20.1570118Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1570123Z 
2025-12-04T10:35:20.1570724Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1570771Z 
2025-12-04T10:35:20.1570775Z 
2025-12-04T10:35:20.1570961Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1571693Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1571698Z 
2025-12-04T10:35:20.1571920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1572072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1572254Z ============= 1 failed, 4 skipped, 23 deselected, 2 rerun in 2.42s =============
2025-12-04T10:35:20.1572331Z Got exit code 1
2025-12-04T10:35:20.1572418Z Retrying single test...
2025-12-04T10:35:20.1572819Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml
2025-12-04T10:35:20.1572959Z ============================= test session starts ==============================
2025-12-04T10:35:20.1573248Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1573337Z cachedir: .pytest_cache
2025-12-04T10:35:20.1573785Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1573883Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1573967Z configfile: pytest.ini
2025-12-04T10:35:20.1574427Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1574615Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.1575285Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1575379Z Running 1 items in this shard
2025-12-04T10:35:20.1575385Z 
2025-12-04T10:35:20.1576745Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1577672Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1578029Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1578404Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1578836Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1579298Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1579747Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1580198Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1584323Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1584823Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1585373Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1585794Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1586233Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1586638Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1587021Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1587401Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1587949Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1588401Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1588859Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1589283Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1589776Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1590226Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1590721Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1591251Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1591730Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1592183Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1592609Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1593031Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1593439Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1593838Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1594339Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1594792Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1595281Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1595722Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1596083Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1596539Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1596909Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1597316Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1597759Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1598158Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1598596Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1599089Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1599583Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1600118Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1600525Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1600895Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1601379Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1601751Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1602231Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1602767Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1603199Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1603794Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1604393Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1604698Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1606553Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1607048Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1608147Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1608843Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1609894Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1610475Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1611224Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1611880Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1612397Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1613328Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1613633Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1614395Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1614509Z ('RERUN', {'yellow': True}) [1.7509s] [100%]
2025-12-04T10:35:20.1615869Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1616799Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1617160Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1617532Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1617967Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1618360Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1618809Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1619338Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1619897Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1620386Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1620899Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1621269Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1621706Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1622105Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1622484Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1622860Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1623400Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1623840Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1624313Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1624736Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1625230Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1625680Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1626172Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1626618Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1627679Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1628144Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1628573Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1628993Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1629393Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1629795Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1630301Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1630751Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1631241Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1631684Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1632047Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1632506Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1632881Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1633284Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1633727Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1634126Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1634557Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1635051Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1635543Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1636081Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1636481Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1636861Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1637345Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1637715Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1638194Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1638727Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1639159Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1639752Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1640352Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1640653Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1642436Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1642928Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1643816Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1644395Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1645152Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1645751Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1646519Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1647182Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1647703Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1648633Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1648939Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1649709Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1649820Z ('RERUN', {'yellow': True}) [0.3103s] [100%]
2025-12-04T10:35:20.1651122Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1652049Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1652505Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1652879Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1653315Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1653708Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1654157Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1654610Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1655104Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1655663Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1656204Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1656575Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1657010Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1657408Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1657790Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1658167Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1658710Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1659252Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1659721Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1660146Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1660637Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1661090Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1661577Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1662030Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1662592Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1663047Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1663475Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1663887Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1664293Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1664699Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1665201Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1665661Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1666149Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1666590Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1666954Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1667412Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1667779Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1668187Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1668633Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1669036Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1669474Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1669971Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1670461Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1671004Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1671406Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1671779Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1672263Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1672642Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1673123Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1673671Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1674109Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1674702Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1675301Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1675603Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1677399Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1677896Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1678793Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1679369Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1680121Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1680703Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1681453Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1682112Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1682632Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1683570Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1683876Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1684634Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1684723Z FAILED [0.3099s] [100%]
2025-12-04T10:35:20.1684728Z 
2025-12-04T10:35:20.1684845Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1685264Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1685367Z Traceback (most recent call last):
2025-12-04T10:35:20.1685758Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1685979Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1686393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1686613Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1687050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1687211Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1687657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1687784Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1688248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1688521Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1688965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1689139Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1689550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1689690Z     return self._compile_to_module()
2025-12-04T10:35:20.1690108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1690249Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1690702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1690812Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1691232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1691433Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1691937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1692049Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1692491Z   File "/tmp/tmpjlk1b9r6/we/cwegshnuygtwiswwxzaf2pjal5zweorw4eqvim6llayn5yzsw7x3.py", line 74, in <module>
2025-12-04T10:35:20.1692885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1692985Z     kernel.precompile(
2025-12-04T10:35:20.1693455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1693552Z     self._precompile_worker()
2025-12-04T10:35:20.1694060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1694209Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1694718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1694885Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1695264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1695475Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1695977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1696273Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1696472Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1697025Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1697111Z ^
2025-12-04T10:35:20.1697505Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1697512Z 
2025-12-04T10:35:20.1698124Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1698129Z 
2025-12-04T10:35:20.1698133Z 
2025-12-04T10:35:20.1698319Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1699110Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1699120Z 
2025-12-04T10:35:20.1699344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1699569Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1699663Z frames [('total', 1)]
2025-12-04T10:35:20.1699758Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1700162Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1700395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1700475Z graph_break []
2025-12-04T10:35:20.1700813Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1700917Z Traceback (most recent call last):
2025-12-04T10:35:20.1701274Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1701481Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1701891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1702100Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1702542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1702702Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1703137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1703260Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1703709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1703985Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1704420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1704545Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1704961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1705063Z     return self._compile_to_module()
2025-12-04T10:35:20.1705477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1705610Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1706126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1706238Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1706655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1706851Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1707348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1707453Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1708040Z   File "/tmp/tmp9y6bp0ro/xt/cxtfgml7cumopkfnyygflis3np74vyxgthfe6e2vihzv2h2hmbwk.py", line 74, in <module>
2025-12-04T10:35:20.1708432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1708530Z     kernel.precompile(
2025-12-04T10:35:20.1709006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1709099Z     self._precompile_worker()
2025-12-04T10:35:20.1709615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1709839Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1710344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1710517Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1710979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1711199Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1711578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1711858Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1712056Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1712616Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1712699Z ^
2025-12-04T10:35:20.1713089Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1713097Z 
2025-12-04T10:35:20.1713697Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1713702Z 
2025-12-04T10:35:20.1713706Z 
2025-12-04T10:35:20.1713897Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1714632Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1714638Z 
2025-12-04T10:35:20.1714870Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1715055Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1715140Z frames [('total', 1)]
2025-12-04T10:35:20.1715243Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1715683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1715887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1715970Z graph_break []
2025-12-04T10:35:20.1716259Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1716349Z frames [('total', 1)]
2025-12-04T10:35:20.1716440Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1716625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1717026Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1717106Z graph_break []
2025-12-04T10:35:20.1717234Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1717565Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1717669Z Traceback (most recent call last):
2025-12-04T10:35:20.1718038Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1718235Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1718658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1718870Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1719305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1719471Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1720026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1720145Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1720642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1720910Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1721366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1721488Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1721894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1722004Z     return self._compile_to_module()
2025-12-04T10:35:20.1722417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1722556Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1722996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1723106Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1723535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1723732Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1724228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1724336Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1724744Z   File "/tmp/tmp42c1_5rp/u4/cu4omle2eh76yjdjzlb4zy2vipe7e6uz5ek2bfltn36tqjrkzszq.py", line 74, in <module>
2025-12-04T10:35:20.1725152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1725240Z     kernel.precompile(
2025-12-04T10:35:20.1725717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1725817Z     self._precompile_worker()
2025-12-04T10:35:20.1726325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1726554Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1727065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1727236Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1727619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1727828Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1728199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1728491Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1728681Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1729240Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1729309Z ^
2025-12-04T10:35:20.1729700Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1729705Z 
2025-12-04T10:35:20.1730318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1730366Z 
2025-12-04T10:35:20.1730370Z 
2025-12-04T10:35:20.1730550Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1731333Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1731338Z 
2025-12-04T10:35:20.1731567Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1731745Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1731832Z frames [('total', 1)]
2025-12-04T10:35:20.1731928Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1732326Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1732514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1732593Z graph_break []
2025-12-04T10:35:20.1732773Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1732856Z frames [('total', 1)]
2025-12-04T10:35:20.1732948Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1733141Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1733536Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1733618Z graph_break []
2025-12-04T10:35:20.1733792Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1733871Z frames [('total', 1)]
2025-12-04T10:35:20.1733965Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1734149Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1734545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1734626Z graph_break []
2025-12-04T10:35:20.1735180Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml -
2025-12-04T10:35:20.1735326Z =========================== short test summary info ============================
2025-12-04T10:35:20.1736177Z FAILED [0.3099s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1736728Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1736800Z ^
2025-12-04T10:35:20.1737187Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1737194Z 
2025-12-04T10:35:20.1737801Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1737808Z 
2025-12-04T10:35:20.1737811Z 
2025-12-04T10:35:20.1737986Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1738723Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1738733Z 
2025-12-04T10:35:20.1738954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1739167Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1739339Z ================== 1 failed, 187 deselected, 2 rerun in 2.41s ==================
2025-12-04T10:35:20.1739416Z Got exit code 1
2025-12-04T10:35:20.1739545Z Retrying single test...
2025-12-04T10:35:20.1739949Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml
2025-12-04T10:35:20.1740080Z ============================= test session starts ==============================
2025-12-04T10:35:20.1740412Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1740497Z cachedir: .pytest_cache
2025-12-04T10:35:20.1740950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1741056Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1741141Z configfile: pytest.ini
2025-12-04T10:35:20.1741599Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1741788Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.1742450Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1742549Z Running 1 items in this shard
2025-12-04T10:35:20.1742556Z 
2025-12-04T10:35:20.1743785Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1744715Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1745074Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1745445Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1745933Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1746320Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1746865Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1747319Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1747808Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1748307Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1748772Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1749149Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1749591Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1749985Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1750374Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1750743Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1751354Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1751831Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1752299Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1752721Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1753212Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1753666Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1754157Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1754610Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1755089Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1755560Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1756014Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1756420Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1756828Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1757233Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1757727Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1758265Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1758749Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1759150Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1759520Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1759927Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1760298Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1760699Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1761145Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1761545Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1761966Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1762503Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1763029Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1763567Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1763967Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1764340Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1764815Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1765184Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1765666Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1766120Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1766561Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1767152Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1767746Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1768053Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1769915Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1770370Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1771254Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1771797Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1772550Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1773129Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1773873Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1774565Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1775118Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1776095Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1776402Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1777165Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1777276Z ('RERUN', {'yellow': True}) [1.7434s] [100%]
2025-12-04T10:35:20.1778500Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1779468Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1779822Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1780188Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1780625Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1781013Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1781563Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1782015Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1782507Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1783003Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1783473Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1783848Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1784292Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1784689Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1785075Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1785444Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1786082Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1786566Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1787029Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1787460Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1787951Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1788400Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1788891Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1789346Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1789827Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1790274Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1790701Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1791104Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1791507Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1791908Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1792399Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1792966Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1793449Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1793849Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1794214Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1794621Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1794991Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1795394Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1795842Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1796246Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1796667Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1797209Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1797733Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1798273Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1798674Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1799044Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1799526Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1799892Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1800385Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1800830Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1801268Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1801858Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1802453Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1802758Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1804625Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1805081Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1806015Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1806551Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1807305Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1808028Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1808778Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1809542Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1810150Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1811149Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1811477Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1812291Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1812409Z ('RERUN', {'yellow': True}) [0.3082s] [100%]
2025-12-04T10:35:20.1813722Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.1814712Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1815102Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1815500Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.1816011Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.1816425Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1817015Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1817552Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1818186Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1818833Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1819349Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1819722Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1820163Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1820557Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1820940Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1821309Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.1821923Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1822408Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.1822872Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.1823299Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1823787Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1824237Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.1824723Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1825175Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.1825668Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1826114Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.1826547Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.1826950Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.1827348Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.1827750Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.1828321Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1829057Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.1829671Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1830196Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.1830668Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.1831207Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.1831748Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.1832310Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.1832963Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.1833552Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.1834179Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.1835173Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1836071Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.1836882Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1837451Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.1838014Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.1838757Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.1839317Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.1839993Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.1840653Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.1841317Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.1842224Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.1843100Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.1843561Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1846435Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1847165Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1848477Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1849322Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1850501Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1851384Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1852495Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1853640Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1854540Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1855952Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1856450Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1857553Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1857698Z FAILED [0.3079s] [100%]
2025-12-04T10:35:20.1857722Z 
2025-12-04T10:35:20.1857893Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.1858394Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1858543Z Traceback (most recent call last):
2025-12-04T10:35:20.1859099Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1859445Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1860072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1860357Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1860917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1861148Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1861737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1861920Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1862684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1863081Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1863763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1863946Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1864554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1864712Z     return self._compile_to_module()
2025-12-04T10:35:20.1865343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1865618Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1866235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1866415Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1867002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1867305Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1868044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1868310Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1868892Z   File "/tmp/tmpscihzwt2/5l/c5lixojicqkkihemc4dhkmp3kh4lt5ommxwfumeppk7vvctzoxen.py", line 74, in <module>
2025-12-04T10:35:20.1869518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1869658Z     kernel.precompile(
2025-12-04T10:35:20.1870345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1870495Z     self._precompile_worker()
2025-12-04T10:35:20.1871283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1871518Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1872251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1872507Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1873034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1873344Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1873892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1874328Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1874610Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1875489Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1875611Z ^
2025-12-04T10:35:20.1876207Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1876216Z 
2025-12-04T10:35:20.1877122Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1877138Z 
2025-12-04T10:35:20.1877144Z 
2025-12-04T10:35:20.1877432Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1878726Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1878738Z 
2025-12-04T10:35:20.1879105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1879374Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1879558Z frames [('total', 1)]
2025-12-04T10:35:20.1879720Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1880311Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1880621Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1880756Z graph_break []
2025-12-04T10:35:20.1881256Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1881427Z Traceback (most recent call last):
2025-12-04T10:35:20.1881988Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1882289Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1882925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1883257Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1884059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1884318Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1885038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1885252Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1885946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1886359Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1887043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1887246Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1887885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1888052Z     return self._compile_to_module()
2025-12-04T10:35:20.1888700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1888920Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1889588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1889786Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1890441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1890749Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1891533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1891707Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1892352Z   File "/tmp/tmp4_1w8kz3/uc/cucaizpc6iis4deacgalizwass7uald6fjcjnd673j3ncjzfdlxf.py", line 74, in <module>
2025-12-04T10:35:20.1892930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1893074Z     kernel.precompile(
2025-12-04T10:35:20.1893908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1894054Z     self._precompile_worker()
2025-12-04T10:35:20.1894676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1894834Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1895341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1895544Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1895950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1896158Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1896539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1896825Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1897033Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1897591Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1897663Z ^
2025-12-04T10:35:20.1898136Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1898142Z 
2025-12-04T10:35:20.1898749Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1898824Z 
2025-12-04T10:35:20.1898828Z 
2025-12-04T10:35:20.1899019Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1899866Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1899872Z 
2025-12-04T10:35:20.1900640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1900885Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1901015Z frames [('total', 1)]
2025-12-04T10:35:20.1901173Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1901686Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1901946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1902075Z graph_break []
2025-12-04T10:35:20.1902319Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1902443Z frames [('total', 1)]
2025-12-04T10:35:20.1902588Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1902825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1903396Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1903533Z graph_break []
2025-12-04T10:35:20.1903713Z =================================== FAILURES ===================================
2025-12-04T10:35:20.1904179Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _
2025-12-04T10:35:20.1904332Z Traceback (most recent call last):
2025-12-04T10:35:20.1904802Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.1905095Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.1905654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.1906065Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.1906528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.1906698Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.1907150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.1907292Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.1908022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.1908332Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.1908788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.1915252Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.1915893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.1916047Z     return self._compile_to_module()
2025-12-04T10:35:20.1916970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.1917174Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.1918088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.1918282Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.1919004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.1919294Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.1919958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.1920081Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.1920529Z   File "/tmp/tmpx3a3cv_r/wg/cwgbdfiglz22weuk7ohiniahjmveykawtvighr3gy7gyp446qejl.py", line 74, in <module>
2025-12-04T10:35:20.1920939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.1921052Z     kernel.precompile(
2025-12-04T10:35:20.1921537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.1921646Z     self._precompile_worker()
2025-12-04T10:35:20.1922178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.1922339Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.1922873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1923049Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1923442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1923667Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1924059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1924364Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1924571Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1925139Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1925472Z ^
2025-12-04T10:35:20.1925907Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1925913Z 
2025-12-04T10:35:20.1926542Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1926547Z 
2025-12-04T10:35:20.1926554Z 
2025-12-04T10:35:20.1926747Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1927505Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1927522Z 
2025-12-04T10:35:20.1927758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1927955Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1928062Z frames [('total', 1)]
2025-12-04T10:35:20.1928167Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1928581Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1928787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1928876Z graph_break []
2025-12-04T10:35:20.1929117Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1929208Z frames [('total', 1)]
2025-12-04T10:35:20.1929311Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1929514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1929961Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1930048Z graph_break []
2025-12-04T10:35:20.1930249Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.1930340Z frames [('total', 1)]
2025-12-04T10:35:20.1930444Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.1930643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.1931046Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.1931138Z graph_break []
2025-12-04T10:35:20.1931711Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml -
2025-12-04T10:35:20.1931863Z =========================== short test summary info ============================
2025-12-04T10:35:20.1932613Z FAILED [0.3079s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.1933181Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1933265Z ^
2025-12-04T10:35:20.1933664Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1933669Z 
2025-12-04T10:35:20.1934381Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.1934399Z 
2025-12-04T10:35:20.1934403Z 
2025-12-04T10:35:20.1934595Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.1935347Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1935352Z 
2025-12-04T10:35:20.1935725Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.1935895Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.1936085Z ================== 1 failed, 187 deselected, 2 rerun in 2.39s ==================
2025-12-04T10:35:20.1936168Z Got exit code 1
2025-12-04T10:35:20.1936731Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda
2025-12-04T10:35:20.1937115Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.1937543Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml
2025-12-04T10:35:20.1937693Z ============================= test session starts ==============================
2025-12-04T10:35:20.1938018Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.1938113Z cachedir: .pytest_cache
2025-12-04T10:35:20.1938601Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.1938714Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.1938811Z configfile: pytest.ini
2025-12-04T10:35:20.1939385Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.1939640Z collecting ... collected 188 items / 28 deselected / 160 selected
2025-12-04T10:35:20.1939768Z stepcurrent: skipping 28 already run items.
2025-12-04T10:35:20.1939878Z Running 160 items in this shard
2025-12-04T10:35:20.1939922Z 
2025-12-04T10:35:20.1941176Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.1942180Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1942552Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1942946Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.1943397Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.1943802Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1944286Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1944755Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1945264Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1945778Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1946267Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1946651Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1947185Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1947601Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1948006Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1948398Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.1948823Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.1949384Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1950002Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.1950596Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.1951065Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.1951615Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.1952059Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1952510Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.1952890Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.1953307Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.1953689Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.1954089Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.1954552Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.1954959Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.1955415Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.1955929Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1956435Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.1957000Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.1957418Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.1957810Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.1958306Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.1958765Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.1959273Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.1959733Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.1960188Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.1960794Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.1961413Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.1961735Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.1963801Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.1964346Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.1965259Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.1965858Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.1966627Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.1967226Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.1968343Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.1969330Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.1970088Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.1971525Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1971957Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.1973329Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.1973511Z ('RERUN', {'yellow': True}) [1.9610s] [  0%]
2025-12-04T10:35:20.1975170Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.1976497Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.1977003Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.1977585Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.1978198Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.1978781Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.1979698Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.1980473Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.1981225Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.1981987Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.1982714Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.1983287Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.1983968Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.1984972Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.1985886Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.1986486Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.1987118Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.1987942Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.1988854Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.1989730Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.1990436Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.1991297Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.1991975Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.1992560Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.1993116Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.1993720Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.1994291Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.1994883Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.1995590Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.1996227Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.1996877Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.1997746Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.1998569Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.1999398Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2000017Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2000599Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2001322Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2001891Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2002607Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2003312Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2003979Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2004895Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2005881Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2006350Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2009800Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2010530Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2011854Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2012674Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2013825Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2014710Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2016013Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2017121Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2017920Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2019485Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2019952Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2021098Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2021289Z ('RERUN', {'yellow': True}) [0.4914s] [  0%]
2025-12-04T10:35:20.2023123Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2024610Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2025180Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2025762Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2026411Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2027229Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2027946Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2028622Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2029391Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2030138Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2030846Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2031430Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2032089Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2032717Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2033297Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2033957Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2034563Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2035469Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2036414Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2037314Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2038006Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2038691Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2039331Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2039953Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2040504Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2041125Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2041670Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2042271Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2042934Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2043540Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2044344Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2045110Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2045828Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2046670Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2047296Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2047873Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2048631Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2049180Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2049935Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2050621Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2051378Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2052334Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2053269Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2053750Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2056807Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2057532Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2058858Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2059786Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2060928Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2061805Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2063073Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2064071Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2064851Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2066325Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2066816Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2067953Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2068102Z FAILED [0.4941s] [  0%]
2025-12-04T10:35:20.2068111Z 
2025-12-04T10:35:20.2068293Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.2068885Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2069052Z Traceback (most recent call last):
2025-12-04T10:35:20.2069673Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2069991Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2070628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2070942Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2071622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2071869Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2075631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2075920Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2076607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2077036Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2077706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2077909Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2078548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2078704Z     return self._compile_to_module()
2025-12-04T10:35:20.2079325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2079571Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2080225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2080416Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2081309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2081624Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2082494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2082669Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2083331Z   File "/tmp/tmp_lq9uezc/nk/cnkstcietkbkskwkvzuxgmyote4ffwvprahqkurchqshgwaa7ztm.py", line 137, in <module>
2025-12-04T10:35:20.2083914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2084071Z     kernel.precompile(
2025-12-04T10:35:20.2084814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2084962Z     self._precompile_worker()
2025-12-04T10:35:20.2085740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2085974Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2086727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2086996Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2087535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2087961Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2088530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2088974Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2089356Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2090264Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2090381Z ^
2025-12-04T10:35:20.2090976Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2090986Z 
2025-12-04T10:35:20.2091884Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2091899Z 
2025-12-04T10:35:20.2092037Z 
2025-12-04T10:35:20.2092332Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2093450Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2093466Z 
2025-12-04T10:35:20.2093804Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2094111Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2094243Z frames [('total', 1)]
2025-12-04T10:35:20.2094399Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2094993Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2095295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2095433Z graph_break []
2025-12-04T10:35:20.2095939Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2096101Z Traceback (most recent call last):
2025-12-04T10:35:20.2096654Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2096942Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2097687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2098011Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2098669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2098918Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2099675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2099877Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2100550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2100985Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2101657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2101844Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2102468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2102631Z     return self._compile_to_module()
2025-12-04T10:35:20.2103277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2103595Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2104280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2104518Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2105159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2105470Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2106259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2106425Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2107070Z   File "/tmp/tmpkl4yim31/lg/clgiw6lh6c2gmqnklcjejrrrlzrz7tvt2kmedr33sxktzyowcohg.py", line 137, in <module>
2025-12-04T10:35:20.2107986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2108146Z     kernel.precompile(
2025-12-04T10:35:20.2108867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2109026Z     self._precompile_worker()
2025-12-04T10:35:20.2109786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2110032Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2110784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2111050Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2111636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2111966Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2112550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2112972Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2113274Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2114325Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2114454Z ^
2025-12-04T10:35:20.2115059Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2115070Z 
2025-12-04T10:35:20.2116007Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2116027Z 
2025-12-04T10:35:20.2116034Z 
2025-12-04T10:35:20.2116318Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2117433Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2117448Z 
2025-12-04T10:35:20.2117805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2118101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2118224Z frames [('total', 1)]
2025-12-04T10:35:20.2118378Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2118989Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2119274Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2119541Z graph_break []
2025-12-04T10:35:20.2119818Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2119953Z frames [('total', 1)]
2025-12-04T10:35:20.2120212Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2120499Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2121084Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2121215Z graph_break []
2025-12-04T10:35:20.2121406Z =================================== FAILURES ===================================
2025-12-04T10:35:20.2121913Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2122083Z Traceback (most recent call last):
2025-12-04T10:35:20.2122620Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2123062Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2123719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2124047Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2124698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2124949Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2125610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2125807Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2126500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2126922Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2127598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2127811Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2128417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2128571Z     return self._compile_to_module()
2025-12-04T10:35:20.2129284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2129504Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2130166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2130346Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2130966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2131288Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2132056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2132231Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2132866Z   File "/tmp/tmpdds3g8_9/vn/cvn2vo7n7mxdtr6e5zhza3xkubbm6tuglkrgpdglrugi5n7ay5il.py", line 137, in <module>
2025-12-04T10:35:20.2133460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2133622Z     kernel.precompile(
2025-12-04T10:35:20.2134336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2134493Z     self._precompile_worker()
2025-12-04T10:35:20.2135382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2135618Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2136498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2136744Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2137288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2137584Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2138104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2138548Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2138844Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2139884Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2140056Z ^
2025-12-04T10:35:20.2140632Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2140642Z 
2025-12-04T10:35:20.2141565Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2141575Z 
2025-12-04T10:35:20.2141582Z 
2025-12-04T10:35:20.2141875Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2142969Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2142995Z 
2025-12-04T10:35:20.2143351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2143633Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2143783Z frames [('total', 1)]
2025-12-04T10:35:20.2143937Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2144545Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2144940Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2145082Z graph_break []
2025-12-04T10:35:20.2145368Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2145511Z frames [('total', 1)]
2025-12-04T10:35:20.2145657Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2145946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2146569Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2146696Z graph_break []
2025-12-04T10:35:20.2146989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2147123Z frames [('total', 1)]
2025-12-04T10:35:20.2147276Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2147567Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2148173Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2148309Z graph_break []
2025-12-04T10:35:20.2149139Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml -
2025-12-04T10:35:20.2149374Z =========================== short test summary info ============================
2025-12-04T10:35:20.2150554Z FAILED [0.4941s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2151557Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2151679Z ^
2025-12-04T10:35:20.2152280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2152290Z 
2025-12-04T10:35:20.2153207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2153217Z 
2025-12-04T10:35:20.2153232Z 
2025-12-04T10:35:20.2153519Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2154744Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2154756Z 
2025-12-04T10:35:20.2155122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2155350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.2155617Z ================== 1 failed, 28 deselected, 2 rerun in 2.98s ===================
2025-12-04T10:35:20.2155745Z Got exit code 1
2025-12-04T10:35:20.2155878Z Retrying single test...
2025-12-04T10:35:20.2156484Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml
2025-12-04T10:35:20.2156702Z ============================= test session starts ==============================
2025-12-04T10:35:20.2157149Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.2157309Z cachedir: .pytest_cache
2025-12-04T10:35:20.2157981Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.2158157Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.2158301Z configfile: pytest.ini
2025-12-04T10:35:20.2159000Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.2159398Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.2160427Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2160580Z Running 1 items in this shard
2025-12-04T10:35:20.2160588Z 
2025-12-04T10:35:20.2162450Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2163949Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2164525Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2165113Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2165832Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2166546Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2167234Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2168013Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2168788Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2169557Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2170278Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2171043Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2171730Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2172337Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2172942Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2173513Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2174158Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2174981Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2175928Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2176965Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2177758Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2178479Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2179230Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2179842Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2180424Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2181039Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2181614Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2182217Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2182896Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2183518Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2184278Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2185058Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2185902Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2186745Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2187374Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2187960Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2188824Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2189396Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2190151Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2190854Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2191514Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2192427Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2193362Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2193835Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2197047Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2197777Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2199131Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2199961Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2201119Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2202008Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2203293Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2204389Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2205192Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2206738Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2207332Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2208745Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2208935Z ('RERUN', {'yellow': True}) [1.9510s] [100%]
2025-12-04T10:35:20.2210803Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2212305Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2212877Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2213464Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2214136Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2214879Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2215564Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2216268Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2217036Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2217789Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2218517Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2219198Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2219869Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2220482Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2221232Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2221802Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2222537Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2223369Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2224266Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2225156Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2225967Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2226702Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2227374Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2227986Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2228557Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2229169Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2229758Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2230361Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2231026Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2231664Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2232407Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2233174Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2233926Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2234766Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2235403Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2236036Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2236764Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2237334Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2238079Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2238862Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2239529Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2240539Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2241447Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2241923Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2245093Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2245871Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2247226Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2248056Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2249206Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2250203Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2251355Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2252497Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2253308Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2254797Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2255291Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2256515Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2256700Z ('RERUN', {'yellow': True}) [0.4947s] [100%]
2025-12-04T10:35:20.2258661Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2260375Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2260951Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2261532Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2262312Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2262920Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2263640Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2264334Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2265086Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2265909Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2266640Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2267215Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2267893Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2268518Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2269243Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2269818Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2270461Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2271298Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2272177Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2273080Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2273784Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2274487Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2275162Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2275867Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2276442Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2277147Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2277717Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2278329Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2279003Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2279720Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2280388Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2281154Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2281908Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2282754Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2283394Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2283963Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2284734Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2285300Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2286106Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2286881Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2287546Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2288474Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2289401Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2289891Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2292864Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2293664Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2295060Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2295919Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2297060Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2298008Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2299225Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2300219Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2301001Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2302713Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2303192Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2304335Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2304469Z FAILED [0.4935s] [100%]
2025-12-04T10:35:20.2304478Z 
2025-12-04T10:35:20.2304726Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.2305238Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2305400Z Traceback (most recent call last):
2025-12-04T10:35:20.2305935Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2306241Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2306872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2307189Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2307999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2308246Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2308896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2309087Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2309762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2310176Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2310954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2311139Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2311839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2311991Z     return self._compile_to_module()
2025-12-04T10:35:20.2312614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2312832Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2313490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2313663Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2314382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2314683Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2324936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2325136Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2325809Z   File "/tmp/tmpwgcf43a9/bq/cbqrphnunnymv467uo6as7dukw46a3k6d5bvglvs5jhb6ylfyciy.py", line 137, in <module>
2025-12-04T10:35:20.2326429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2326570Z     kernel.precompile(
2025-12-04T10:35:20.2327293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2327440Z     self._precompile_worker()
2025-12-04T10:35:20.2328223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2328450Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2329217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2329480Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2330178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2330501Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2331064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2331497Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2331791Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2332707Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2332821Z ^
2025-12-04T10:35:20.2333418Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2333427Z 
2025-12-04T10:35:20.2334344Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2334352Z 
2025-12-04T10:35:20.2334359Z 
2025-12-04T10:35:20.2334640Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2335804Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2335874Z 
2025-12-04T10:35:20.2336226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2336508Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2336690Z frames [('total', 1)]
2025-12-04T10:35:20.2336837Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2337395Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2337697Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2337819Z graph_break []
2025-12-04T10:35:20.2338307Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2338466Z Traceback (most recent call last):
2025-12-04T10:35:20.2338983Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2339484Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2340067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2340399Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2341123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2341377Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2342083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2342290Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2342998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2343447Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2344094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2344270Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2344823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2344971Z     return self._compile_to_module()
2025-12-04T10:35:20.2345617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2345848Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2346438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2346601Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2347184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2347476Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2348141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2348293Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2348834Z   File "/tmp/tmpx34jld5q/r5/cr5uyoyiv73za5p65b7bnhr7jcdy67h45xixxulikvhyeqjcd7wh.py", line 137, in <module>
2025-12-04T10:35:20.2349324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2349439Z     kernel.precompile(
2025-12-04T10:35:20.2350024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2350126Z     self._precompile_worker()
2025-12-04T10:35:20.2350717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2350873Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2351423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2351596Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2351974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2352177Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2352553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2352837Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2353040Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2353698Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2353774Z ^
2025-12-04T10:35:20.2354184Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2354189Z 
2025-12-04T10:35:20.2354797Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2354802Z 
2025-12-04T10:35:20.2354806Z 
2025-12-04T10:35:20.2354996Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2355999Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2356011Z 
2025-12-04T10:35:20.2356248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2356539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2356632Z frames [('total', 1)]
2025-12-04T10:35:20.2356734Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2357134Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2357378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2357472Z graph_break []
2025-12-04T10:35:20.2357650Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2357736Z frames [('total', 1)]
2025-12-04T10:35:20.2357839Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2358018Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2358431Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2358513Z graph_break []
2025-12-04T10:35:20.2358633Z =================================== FAILURES ===================================
2025-12-04T10:35:20.2358979Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2359084Z Traceback (most recent call last):
2025-12-04T10:35:20.2359449Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2359656Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2360072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2360292Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2360774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2360937Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2361378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2361541Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2362007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2362278Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2362718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2362851Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2363322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2363426Z     return self._compile_to_module()
2025-12-04T10:35:20.2363843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2363984Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2364433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2364545Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2364966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2365168Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2365722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2365839Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2366268Z   File "/tmp/tmp_achl4kh/lh/clhzoaw7k5fjx7ijd5ieu5lsgnbesjgeljltesnnsgeuntuij5jc.py", line 137, in <module>
2025-12-04T10:35:20.2366662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2366769Z     kernel.precompile(
2025-12-04T10:35:20.2367241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2367386Z     self._precompile_worker()
2025-12-04T10:35:20.2367900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2368053Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2368570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2368741Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2369121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2369338Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2369711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2370005Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2370200Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2370806Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2370887Z ^
2025-12-04T10:35:20.2371280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2371332Z 
2025-12-04T10:35:20.2371953Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2371996Z 
2025-12-04T10:35:20.2372000Z 
2025-12-04T10:35:20.2372185Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2372934Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2372945Z 
2025-12-04T10:35:20.2373172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2373353Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2373446Z frames [('total', 1)]
2025-12-04T10:35:20.2373543Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2373980Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2374180Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2374265Z graph_break []
2025-12-04T10:35:20.2374459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2374544Z frames [('total', 1)]
2025-12-04T10:35:20.2374640Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2374833Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2375225Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2375302Z graph_break []
2025-12-04T10:35:20.2375488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2375592Z frames [('total', 1)]
2025-12-04T10:35:20.2375699Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2375908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2376297Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2376387Z graph_break []
2025-12-04T10:35:20.2376940Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml -
2025-12-04T10:35:20.2377131Z =========================== short test summary info ============================
2025-12-04T10:35:20.2377861Z FAILED [0.4935s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2378461Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2378540Z ^
2025-12-04T10:35:20.2378929Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2378936Z 
2025-12-04T10:35:20.2379662Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2379673Z 
2025-12-04T10:35:20.2379677Z 
2025-12-04T10:35:20.2379861Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2380602Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2380607Z 
2025-12-04T10:35:20.2380837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2381041Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.2381214Z ================== 1 failed, 187 deselected, 2 rerun in 2.97s ==================
2025-12-04T10:35:20.2381294Z Got exit code 1
2025-12-04T10:35:20.2381451Z Retrying single test...
2025-12-04T10:35:20.2381860Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml
2025-12-04T10:35:20.2381994Z ============================= test session starts ==============================
2025-12-04T10:35:20.2382292Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.2382389Z cachedir: .pytest_cache
2025-12-04T10:35:20.2382835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.2382943Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.2383028Z configfile: pytest.ini
2025-12-04T10:35:20.2383535Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.2383731Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.2384411Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2384503Z Running 1 items in this shard
2025-12-04T10:35:20.2384517Z 
2025-12-04T10:35:20.2385800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2386793Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2387164Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2387544Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2388038Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2388432Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2388890Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2389359Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2389853Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2390363Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2390841Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2391220Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2391661Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2392057Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2392507Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2392882Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2393335Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2393884Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2394471Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2395097Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2395554Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2396033Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2396469Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2396866Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2397244Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2397650Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2398041Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2398433Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2398876Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2399327Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2399760Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2400275Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2400775Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2401325Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2401738Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2402115Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2402616Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2402995Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2403493Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2403993Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2404479Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2405086Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2405712Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2406061Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2408508Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2408994Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2409886Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2410437Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2411197Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2411850Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2412615Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2413280Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2413829Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2414820Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2415139Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2415959Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2416137Z ('RERUN', {'yellow': True}) [1.9588s] [100%]
2025-12-04T10:35:20.2417385Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2418512Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2418893Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2419323Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2419832Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2420227Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2420693Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2421166Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2421668Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2422175Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2422650Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2423033Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2423491Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2423945Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2424352Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2424730Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2425150Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2425769Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2426356Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2426957Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2427403Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2427890Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2428399Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2428807Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2429224Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2429635Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2430034Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2430428Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2430880Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2431327Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2431756Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2432278Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2432772Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2433319Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2433733Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2434115Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2434614Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2434991Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2435573Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2436151Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2436700Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2437466Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2438145Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2438465Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2440516Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2441073Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2441975Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2442528Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2443286Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2443911Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2444683Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2445347Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2445877Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2446867Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2447528Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2448539Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2448795Z ('RERUN', {'yellow': True}) [0.4960s] [100%]
2025-12-04T10:35:20.2450792Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.2452162Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2452676Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.2453188Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.2453784Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.2454329Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2454973Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2455798Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2456515Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2457241Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.2457847Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2458338Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.2458998Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2459735Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.2460273Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.2460795Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.2461371Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.2462173Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.2463050Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2463961Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.2464615Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.2465280Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.2466019Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2466625Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.2467191Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.2467813Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.2468376Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.2468964Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.2469653Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.2470237Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.2470906Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.2471696Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2472553Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.2473410Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.2474012Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.2474569Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.2475295Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.2475913Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.2476724Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.2477381Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.2478046Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.2478924Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.2479804Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.2480285Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2483466Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2484222Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2485595Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2486409Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2487529Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2488403Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2489486Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2490559Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2491332Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2492773Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2493264Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2494497Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2494657Z FAILED [0.4923s] [100%]
2025-12-04T10:35:20.2494665Z 
2025-12-04T10:35:20.2494854Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.2495374Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2495557Z Traceback (most recent call last):
2025-12-04T10:35:20.2496125Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2496458Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2497038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2497357Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2498029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2498288Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2498973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2499284Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2499971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2500505Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2501161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2501365Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2501978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2502153Z     return self._compile_to_module()
2025-12-04T10:35:20.2502785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2503011Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2503674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2503866Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2504517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2504837Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2505655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2505935Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2506603Z   File "/tmp/tmpdaksqrlq/ph/cphhvmbkzw5mj2i3mnvc2ta236jgrhd623fwou6xswtfm42c5snp.py", line 137, in <module>
2025-12-04T10:35:20.2507180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2507448Z     kernel.precompile(
2025-12-04T10:35:20.2508334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2508509Z     self._precompile_worker()
2025-12-04T10:35:20.2509304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2509542Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2510292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2510713Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2511294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2511627Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2512205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2512635Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2512940Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2513842Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2513979Z ^
2025-12-04T10:35:20.2514598Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2514617Z 
2025-12-04T10:35:20.2515585Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2515610Z 
2025-12-04T10:35:20.2515616Z 
2025-12-04T10:35:20.2515920Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2517196Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2517208Z 
2025-12-04T10:35:20.2517569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2517878Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2518027Z frames [('total', 1)]
2025-12-04T10:35:20.2518184Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2518818Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2519106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2519239Z graph_break []
2025-12-04T10:35:20.2519761Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2519939Z Traceback (most recent call last):
2025-12-04T10:35:20.2520479Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2520783Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2521383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2521689Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2522414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2522651Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2523259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2523542Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2524199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2524641Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2525341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2525546Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2526237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2526476Z     return self._compile_to_module()
2025-12-04T10:35:20.2527124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2527341Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2528024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2528205Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2528824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2529123Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2529895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2530069Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2530720Z   File "/tmp/tmpxbbpouox/km/ckm4qsvmoobmv5g76ztlzd62cjsxv3yhtad6raop7gduyk5xhu6z.py", line 137, in <module>
2025-12-04T10:35:20.2531286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2531424Z     kernel.precompile(
2025-12-04T10:35:20.2532128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2532351Z     self._precompile_worker()
2025-12-04T10:35:20.2533107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2533332Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2534075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2534346Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2534910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2535225Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2535805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2536301Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2536607Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2537515Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2537638Z ^
2025-12-04T10:35:20.2538234Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2538343Z 
2025-12-04T10:35:20.2539373Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2539462Z 
2025-12-04T10:35:20.2539468Z 
2025-12-04T10:35:20.2539757Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2540851Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2540863Z 
2025-12-04T10:35:20.2541226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2541508Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2541644Z frames [('total', 1)]
2025-12-04T10:35:20.2541809Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2542571Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2542879Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2543018Z graph_break []
2025-12-04T10:35:20.2543301Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2543437Z frames [('total', 1)]
2025-12-04T10:35:20.2543593Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2543890Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2544506Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2544632Z graph_break []
2025-12-04T10:35:20.2544834Z =================================== FAILURES ===================================
2025-12-04T10:35:20.2545331Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _
2025-12-04T10:35:20.2545488Z Traceback (most recent call last):
2025-12-04T10:35:20.2546040Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2546359Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2547021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2547449Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2548130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2548386Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2549092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2549283Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2550006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2550433Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2551086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2551282Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2551845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2552003Z     return self._compile_to_module()
2025-12-04T10:35:20.2552585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2552804Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2553601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2553778Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2554490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2554809Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2555625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2555810Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2556492Z   File "/tmp/tmp8dywiimk/pf/cpfhxlm2pnfka2dekuhp4h6as7l6mrjl3kzzij77swb5j3kxxjkx.py", line 137, in <module>
2025-12-04T10:35:20.2557105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2557261Z     kernel.precompile(
2025-12-04T10:35:20.2558064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2558248Z     self._precompile_worker()
2025-12-04T10:35:20.2559024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2559267Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2560076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2560327Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2560891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2561223Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2561817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2562285Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2562586Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2563502Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2563631Z ^
2025-12-04T10:35:20.2564338Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2564351Z 
2025-12-04T10:35:20.2565299Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2565311Z 
2025-12-04T10:35:20.2565324Z 
2025-12-04T10:35:20.2565627Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2566801Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2566820Z 
2025-12-04T10:35:20.2567186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2567481Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2567628Z frames [('total', 1)]
2025-12-04T10:35:20.2567775Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2568413Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2568705Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2568833Z graph_break []
2025-12-04T10:35:20.2569236Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2569366Z frames [('total', 1)]
2025-12-04T10:35:20.2569511Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2569809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2570492Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2570626Z graph_break []
2025-12-04T10:35:20.2570932Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2571064Z frames [('total', 1)]
2025-12-04T10:35:20.2571225Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2571515Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2572101Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.2572237Z graph_break []
2025-12-04T10:35:20.2573181Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml -
2025-12-04T10:35:20.2573419Z =========================== short test summary info ============================
2025-12-04T10:35:20.2574523Z FAILED [0.4923s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2575422Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.2575562Z ^
2025-12-04T10:35:20.2576164Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2576176Z 
2025-12-04T10:35:20.2577041Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2577053Z 
2025-12-04T10:35:20.2577058Z 
2025-12-04T10:35:20.2577307Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2578158Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2578171Z 
2025-12-04T10:35:20.2578487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2578651Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.2578832Z ================== 1 failed, 187 deselected, 2 rerun in 2.98s ==================
2025-12-04T10:35:20.2578916Z Got exit code 1
2025-12-04T10:35:20.2579565Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda
2025-12-04T10:35:20.2579929Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.2580329Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml
2025-12-04T10:35:20.2580477Z ============================= test session starts ==============================
2025-12-04T10:35:20.2580780Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.2580872Z cachedir: .pytest_cache
2025-12-04T10:35:20.2581328Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.2581432Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.2581520Z configfile: pytest.ini
2025-12-04T10:35:20.2581989Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.2582239Z collecting ... collected 188 items / 29 deselected / 159 selected
2025-12-04T10:35:20.2582362Z stepcurrent: skipping 29 already run items.
2025-12-04T10:35:20.2582455Z Running 159 items in this shard
2025-12-04T10:35:20.2582501Z 
2025-12-04T10:35:20.2583788Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2584726Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2585097Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2585579Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2585984Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2586444Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2586904Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2587396Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2587822Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2588295Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2588691Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2589058Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2589608Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2590111Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2590625Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2591132Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2591584Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2592032Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2592462Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2592865Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2593260Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2593950Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2594447Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2595011Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2595624Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2596143Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2596478Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2597078Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2597602Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2598170Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2598771Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2599174Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2599583Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2599979Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2600521Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2601007Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2601470Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2601964Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2602412Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2602870Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2603285Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2603686Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2604086Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2604775Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2605276Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2605704Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2606179Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2606609Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2606994Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2607420Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2608195Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2608747Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2609197Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2609698Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2610193Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2610690Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2611116Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2611509Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2612001Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2612395Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2612944Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2613414Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2613945Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2614436Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2614909Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2615209Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2617204Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2617718Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2618670Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2619247Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2620010Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2620631Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2621382Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2622044Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2622560Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2623507Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2623816Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2624591Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2624703Z ('RERUN', {'yellow': True}) [1.7863s] [  0%]
2025-12-04T10:35:20.2625952Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2626887Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2627258Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2627645Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2628043Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2628511Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2628973Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2629471Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2629931Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2630440Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2630827Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2631191Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2631702Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2632245Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2632762Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2633265Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2633721Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2634166Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2634592Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2634996Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2635407Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2636098Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2636591Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2637089Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2637698Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2638216Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2638551Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2639109Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2639631Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2640200Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2640799Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2641365Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2641808Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2642205Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2642748Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2643193Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2643697Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2644203Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2644655Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2645115Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2645528Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2645980Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2646377Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2647068Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2647529Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2647988Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2648377Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2648808Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2649194Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2649627Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2650081Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2650499Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2650948Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2651448Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2651944Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2652484Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2652906Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2653336Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2653827Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2654231Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2654722Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2655235Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2655817Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2656312Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2656798Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2657101Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2659104Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2659614Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2660528Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2661062Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2661843Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2662425Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2663175Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2663843Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2664410Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2665364Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2665721Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2673582Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2673719Z ('RERUN', {'yellow': True}) [0.3333s] [  0%]
2025-12-04T10:35:20.2674960Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2675955Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2676335Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2676721Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2677113Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2677578Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2678040Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2678536Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2679008Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2679484Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2679871Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2680234Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2680746Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2681250Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2681760Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2682254Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2682706Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2683153Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2683623Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2684074Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2684475Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2685167Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2685669Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2686211Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2686823Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2687343Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2687685Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2688240Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2688759Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2689339Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2689945Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2690351Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2690824Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2691221Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2691758Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2692208Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2692670Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2693163Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2693613Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2694070Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2694489Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2694938Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2695336Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2696070Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2696523Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2696941Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2697326Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2697803Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2698188Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2698622Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2699145Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2699565Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2700012Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2700517Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2701010Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2701506Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2701978Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2702367Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2702855Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2703246Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2703734Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2704196Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2704727Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2705217Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2705687Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2706037Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2708266Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2708816Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2709769Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2710310Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2711076Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2711656Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2712406Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2713070Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2713587Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2714582Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2714890Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2715709Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2715802Z FAILED [0.3334s] [  0%]
2025-12-04T10:35:20.2715807Z 
2025-12-04T10:35:20.2715934Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.2716289Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.2716396Z Traceback (most recent call last):
2025-12-04T10:35:20.2716768Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2716971Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2717383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2717603Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2718790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2718967Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2719407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2719574Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2720042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2720322Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2720769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2720906Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2721316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2721473Z     return self._compile_to_module()
2025-12-04T10:35:20.2721893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2722035Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2722493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2722605Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2723037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2723232Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2723737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2723855Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2724272Z   File "/tmp/tmp_87ew5dr/w7/cw74tqprbz5gx3g3n7v4osjyzut7qflyrn4kazjyhdhemaxm5adp.py", line 65, in <module>
2025-12-04T10:35:20.2724671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2724780Z     kernel.precompile(
2025-12-04T10:35:20.2725252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2725409Z     self._precompile_worker()
2025-12-04T10:35:20.2725967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2726119Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2726635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2726810Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2727200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2727410Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2727785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2728081Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2728275Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2728833Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2728915Z ^
2025-12-04T10:35:20.2729313Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2729363Z 
2025-12-04T10:35:20.2729982Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2730029Z 
2025-12-04T10:35:20.2730033Z 
2025-12-04T10:35:20.2730221Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2730988Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2730993Z 
2025-12-04T10:35:20.2731221Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2731405Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2731505Z frames [('total', 1)]
2025-12-04T10:35:20.2731602Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2732077Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2732268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2732353Z graph_break []
2025-12-04T10:35:20.2732704Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.2732811Z Traceback (most recent call last):
2025-12-04T10:35:20.2733174Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2733375Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2733788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2734012Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2734452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2734612Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2735054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2735180Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2735674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2735956Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2736398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2736525Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2736935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2737037Z     return self._compile_to_module()
2025-12-04T10:35:20.2737453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2737590Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2738035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2738143Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2738563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2738765Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2739322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2739481Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2739894Z   File "/tmp/tmp8ydh_584/42/c42lcyd6rv2t2ga7l6unyb64xenips7osbnksuoe7y54utn6lbit.py", line 65, in <module>
2025-12-04T10:35:20.2740288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2740433Z     kernel.precompile(
2025-12-04T10:35:20.2740906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2741003Z     self._precompile_worker()
2025-12-04T10:35:20.2741516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2741665Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2742179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2742390Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2742772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2742988Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2743359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2743656Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2743848Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2744407Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2744493Z ^
2025-12-04T10:35:20.2744884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2744893Z 
2025-12-04T10:35:20.2745509Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2745516Z 
2025-12-04T10:35:20.2745520Z 
2025-12-04T10:35:20.2745704Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2746496Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2746509Z 
2025-12-04T10:35:20.2746739Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2746925Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2747017Z frames [('total', 1)]
2025-12-04T10:35:20.2747113Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2747521Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2747718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2747801Z graph_break []
2025-12-04T10:35:20.2747983Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2748087Z frames [('total', 1)]
2025-12-04T10:35:20.2748189Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2748395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2748796Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2748884Z graph_break []
2025-12-04T10:35:20.2749019Z =================================== FAILURES ===================================
2025-12-04T10:35:20.2749365Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.2749521Z Traceback (most recent call last):
2025-12-04T10:35:20.2749895Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2750134Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2750571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2750787Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2751234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2751416Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2751858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2751997Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2752494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2752768Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2753232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2753362Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2753776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2753893Z     return self._compile_to_module()
2025-12-04T10:35:20.2754311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2754463Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2754912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2755021Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2755463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2755666Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2756215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2756326Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2756762Z   File "/tmp/tmpq6fyvb2n/a5/ca52id2idhnuzt4bfc5ydz3tk3lpmdex4bzcrgy5qzrlundrd3qc.py", line 65, in <module>
2025-12-04T10:35:20.2757203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2757339Z     kernel.precompile(
2025-12-04T10:35:20.2757959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2758098Z     self._precompile_worker()
2025-12-04T10:35:20.2758676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2758833Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2759340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2759505Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2759889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2760093Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2760537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2760825Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2761057Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2761617Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2761689Z ^
2025-12-04T10:35:20.2762083Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2762094Z 
2025-12-04T10:35:20.2762702Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2762707Z 
2025-12-04T10:35:20.2762711Z 
2025-12-04T10:35:20.2762896Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2763701Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2763709Z 
2025-12-04T10:35:20.2763935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2764122Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2764209Z frames [('total', 1)]
2025-12-04T10:35:20.2764305Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2764718Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2764905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2764990Z graph_break []
2025-12-04T10:35:20.2765167Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2765252Z frames [('total', 1)]
2025-12-04T10:35:20.2765357Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2765542Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2765937Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2766022Z graph_break []
2025-12-04T10:35:20.2766197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2766326Z frames [('total', 1)]
2025-12-04T10:35:20.2766426Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2766608Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2767006Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2767092Z graph_break []
2025-12-04T10:35:20.2767655Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml -
2025-12-04T10:35:20.2767801Z =========================== short test summary info ============================
2025-12-04T10:35:20.2768530Z FAILED [0.3334s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2769090Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2769161Z ^
2025-12-04T10:35:20.2769552Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2769557Z 
2025-12-04T10:35:20.2770165Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2770214Z 
2025-12-04T10:35:20.2770218Z 
2025-12-04T10:35:20.2770398Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2771157Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2771227Z 
2025-12-04T10:35:20.2771452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2771613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.2771779Z ================== 1 failed, 29 deselected, 2 rerun in 2.49s ===================
2025-12-04T10:35:20.2771864Z Got exit code 1
2025-12-04T10:35:20.2771959Z Retrying single test...
2025-12-04T10:35:20.2772362Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml
2025-12-04T10:35:20.2772561Z ============================= test session starts ==============================
2025-12-04T10:35:20.2772863Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.2772957Z cachedir: .pytest_cache
2025-12-04T10:35:20.2773416Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.2773521Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.2773609Z configfile: pytest.ini
2025-12-04T10:35:20.2774075Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.2774265Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.2774950Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2775059Z Running 1 items in this shard
2025-12-04T10:35:20.2775064Z 
2025-12-04T10:35:20.2776274Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2777259Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2777629Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2778020Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2778412Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2778862Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2779426Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2779927Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2780356Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2780826Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2781271Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2781638Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2782188Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2782695Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2783209Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2783707Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2784201Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2784653Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2785079Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2785486Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2785937Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2786624Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2787077Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2787584Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2788198Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2788758Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2789100Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2789647Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2790187Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2790761Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2791368Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2791772Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2792180Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2792627Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2793166Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2793663Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2794128Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2794622Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2795068Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2795555Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2795974Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2796381Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2796787Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2797474Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2797935Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2798359Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2798746Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2799178Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2799562Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2800033Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2800487Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2800905Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2801356Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2801859Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2802359Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2802854Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2803275Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2803672Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2804209Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2804644Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2805129Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2805602Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2806179Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2806667Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2807176Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2807483Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2809765Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2810232Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2811124Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2811741Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2812500Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2813086Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2813833Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2814493Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2815017Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2816002Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2816404Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2817163Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2817337Z ('RERUN', {'yellow': True}) [1.7968s] [100%]
2025-12-04T10:35:20.2818489Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2819495Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2819925Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2820308Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2820695Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2821149Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2821619Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2822109Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2822540Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2823012Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2823389Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2823891Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2824490Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2825151Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2825770Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2826263Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2826716Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2827167Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2827587Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2827996Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2828394Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2829146Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2829653Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2830156Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2830764Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2831281Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2831661Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2832216Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2832742Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2833317Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2833930Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2834338Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2834752Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2835158Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2835793Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2836246Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2836709Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2837209Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2837663Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2838111Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2838649Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2839174Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2839583Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2840268Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2840787Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2841249Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2841637Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2842064Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2842446Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2842868Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2843365Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2843782Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2844228Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2844732Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2845226Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2845720Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2846143Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2846534Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2847021Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2847455Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2847941Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2848395Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2848936Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2849422Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2849894Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2850198Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2852129Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2852664Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2853560Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2854088Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2854885Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2855468Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2856271Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2856934Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2857449Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2858393Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2858699Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2859604Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2859718Z ('RERUN', {'yellow': True}) [0.3349s] [100%]
2025-12-04T10:35:20.2860875Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2861810Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2862178Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2862565Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2862951Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2863402Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2863866Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2864404Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2864872Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2865345Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2865757Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2866142Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2866647Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2867193Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2867707Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2868212Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2868661Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2869107Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2869528Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2869937Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2870339Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2871064Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2871514Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2872014Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2872623Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2873148Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2873491Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2874050Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2874570Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2875138Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2875790Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2876237Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2876648Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2877048Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2877584Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2878040Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2878550Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2879053Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2879502Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2879953Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2880381Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2880785Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2881194Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2881882Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2882385Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2882806Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2883197Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2883629Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2884021Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2884447Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2884904Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2885323Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2885824Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2886323Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2886866Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2887361Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2887822Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2888215Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2888705Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2889099Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2889625Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2890081Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2890621Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2891111Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2891579Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2891879Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.2893853Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.2894313Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.2895202Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2895764Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2896556Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2897130Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2897883Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2898547Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2899205Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.2900196Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2900504Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.2901269Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2901362Z FAILED [0.3341s] [100%]
2025-12-04T10:35:20.2901367Z 
2025-12-04T10:35:20.2901555Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.2901910Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.2902017Z Traceback (most recent call last):
2025-12-04T10:35:20.2902384Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2902585Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2903008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2903230Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2903670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2903832Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2904272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2904392Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2904856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2905128Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2905621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2905750Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2906161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2906269Z     return self._compile_to_module()
2025-12-04T10:35:20.2906688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2906823Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2907270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2907378Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2908193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2908409Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2908906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2909015Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2909449Z   File "/tmp/tmpiuodrvvc/ee/ceemmtj5ftz52oo4ru2oymqs5scxwwz63ctjvvrjazhx6mw3w7ol.py", line 65, in <module>
2025-12-04T10:35:20.2909939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2910045Z     kernel.precompile(
2025-12-04T10:35:20.2910574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2910672Z     self._precompile_worker()
2025-12-04T10:35:20.2911189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2911338Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2911849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2912014Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2912448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2912661Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2913035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2913331Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2913522Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2914081Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2914160Z ^
2025-12-04T10:35:20.2914551Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2914556Z 
2025-12-04T10:35:20.2915205Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2915216Z 
2025-12-04T10:35:20.2915221Z 
2025-12-04T10:35:20.2915478Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2916474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2916488Z 
2025-12-04T10:35:20.2916846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2917112Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2917242Z frames [('total', 1)]
2025-12-04T10:35:20.2917371Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2917891Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2918125Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2918210Z graph_break []
2025-12-04T10:35:20.2918557Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.2918668Z Traceback (most recent call last):
2025-12-04T10:35:20.2919026Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2919228Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2919644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2919856Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2920294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2920519Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2920966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2921088Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2921585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2921866Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2922318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2922443Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2922851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2922948Z     return self._compile_to_module()
2025-12-04T10:35:20.2923406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2923545Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2923984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2924096Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2924515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2924713Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2925208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2925310Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2925795Z   File "/tmp/tmpeq2wwevp/qy/cqy3s47ftnsg44gliter2wak4p2qstrv2ijtjlg5mwyzsbmolner.py", line 65, in <module>
2025-12-04T10:35:20.2926192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2926289Z     kernel.precompile(
2025-12-04T10:35:20.2926760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2926853Z     self._precompile_worker()
2025-12-04T10:35:20.2927411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2927559Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2928061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2928231Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2928614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2928826Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2929196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2929480Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2929675Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2930231Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2930304Z ^
2025-12-04T10:35:20.2930692Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2930696Z 
2025-12-04T10:35:20.2931303Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2931352Z 
2025-12-04T10:35:20.2931363Z 
2025-12-04T10:35:20.2931544Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2932335Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2932341Z 
2025-12-04T10:35:20.2932579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2932758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2932848Z frames [('total', 1)]
2025-12-04T10:35:20.2932943Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2933342Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2933579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2933665Z graph_break []
2025-12-04T10:35:20.2933880Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2934008Z frames [('total', 1)]
2025-12-04T10:35:20.2934136Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2934377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2934878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2934957Z graph_break []
2025-12-04T10:35:20.2935082Z =================================== FAILURES ===================================
2025-12-04T10:35:20.2935422Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.2935524Z Traceback (most recent call last):
2025-12-04T10:35:20.2935890Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.2936086Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.2936498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.2936719Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.2937216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.2937383Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.2937820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.2937938Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.2938396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.2938676Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.2939205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.2939329Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.2939735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.2939842Z     return self._compile_to_module()
2025-12-04T10:35:20.2940252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.2940390Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.2940827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.2940994Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.2941451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.2941657Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.2942222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.2942331Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.2942766Z   File "/tmp/tmpf3miv1l8/xt/cxtrioydmgzln76ly23hxyv3bhaf4bk6byzhnyymkpxm5wwv4owv.py", line 65, in <module>
2025-12-04T10:35:20.2943169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.2943259Z     kernel.precompile(
2025-12-04T10:35:20.2943727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.2943873Z     self._precompile_worker()
2025-12-04T10:35:20.2944380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.2944548Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.2945105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.2945290Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.2945720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.2945932Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.2946304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.2946598Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.2946793Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2947353Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2947430Z ^
2025-12-04T10:35:20.2947823Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2947828Z 
2025-12-04T10:35:20.2948498Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2948503Z 
2025-12-04T10:35:20.2948507Z 
2025-12-04T10:35:20.2948689Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2949446Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2949454Z 
2025-12-04T10:35:20.2949678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2949863Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2949946Z frames [('total', 1)]
2025-12-04T10:35:20.2950039Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2950445Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2950630Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2950708Z graph_break []
2025-12-04T10:35:20.2950890Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2950970Z frames [('total', 1)]
2025-12-04T10:35:20.2951069Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2951301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2951698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2951824Z graph_break []
2025-12-04T10:35:20.2952000Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.2952082Z frames [('total', 1)]
2025-12-04T10:35:20.2952184Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.2952367Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.2952758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.2952846Z graph_break []
2025-12-04T10:35:20.2953405Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml -
2025-12-04T10:35:20.2953624Z =========================== short test summary info ============================
2025-12-04T10:35:20.2954404Z FAILED [0.3341s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.2955001Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2955087Z ^
2025-12-04T10:35:20.2955527Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.2955533Z 
2025-12-04T10:35:20.2956217Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.2956222Z 
2025-12-04T10:35:20.2956225Z 
2025-12-04T10:35:20.2956420Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.2957227Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2957234Z 
2025-12-04T10:35:20.2957471Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.2957632Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.2957864Z ================== 1 failed, 187 deselected, 2 rerun in 2.50s ==================
2025-12-04T10:35:20.2957951Z Got exit code 1
2025-12-04T10:35:20.2958162Z Retrying single test...
2025-12-04T10:35:20.2958567Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml
2025-12-04T10:35:20.2958702Z ============================= test session starts ==============================
2025-12-04T10:35:20.2959007Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.2959101Z cachedir: .pytest_cache
2025-12-04T10:35:20.2959548Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.2959660Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.2959750Z configfile: pytest.ini
2025-12-04T10:35:20.2960217Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.2960403Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.2961081Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.2961179Z Running 1 items in this shard
2025-12-04T10:35:20.2961183Z 
2025-12-04T10:35:20.2962389Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.2963372Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.2963745Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.2964132Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.2964518Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.2965024Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.2965500Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.2966040Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.2966471Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.2966941Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.2967319Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.2973136Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.2973675Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2974188Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2974776Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.2975270Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2975733Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2976188Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2976618Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2977024Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2977417Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2978115Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2978560Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.2979219Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2979835Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.2980399Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.2980737Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.2981287Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.2981854Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.2982430Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.2983035Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.2983445Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.2983844Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.2984251Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.2984791Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.2985248Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.2985715Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.2986262Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.2986713Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.2987157Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.2987586Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.2987991Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.2988396Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.2989090Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.2989541Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.2989974Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.2990435Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.2990866Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.2991297Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.2991717Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.2992180Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.2992596Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.2993042Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.2993585Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.2994084Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.2994587Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.2995046Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.2995570Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.2996249Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.2996667Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.2997154Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.2997606Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.2998212Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.2998701Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.2999183Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.2999487Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3001425Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3001881Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3002813Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3003390Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3004148Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3004731Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3005610Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3006272Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3006791Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3008012Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3008332Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3009099Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3009214Z ('RERUN', {'yellow': True}) [1.7780s] [100%]
2025-12-04T10:35:20.3010464Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3011399Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3011761Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.3012147Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3012532Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3012983Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3013451Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3013940Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3014364Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.3014906Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3015284Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3015758Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3016272Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3016773Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3017282Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3017828Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3018284Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3018732Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3019212Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3019620Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3020013Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3020706Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3021154Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3021663Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3022320Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3022837Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3023175Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3023728Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3024260Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3024831Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3025438Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3025842Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3026286Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3026691Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3027265Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3027727Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3028195Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3028701Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3029191Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3029641Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3030073Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3030486Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3030889Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3031582Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3032034Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3032459Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3032849Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3033431Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3033824Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3034240Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3034704Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3035130Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3035619Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3036135Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3036632Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3037141Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.3037557Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3038000Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3038486Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3038922Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3039408Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3039869Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3040403Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.3040933Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3041414Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.3041713Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3043645Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3044100Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3045030Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3045581Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3046370Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3046956Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3047703Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3048362Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3048878Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3049871Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3050274Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3051090Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3051205Z ('RERUN', {'yellow': True}) [0.3334s] [100%]
2025-12-04T10:35:20.3052360Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3053337Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3053706Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.3054094Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3054493Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3054950Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3055412Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3055904Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3056328Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.3056797Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3057220Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3057591Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3058093Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3058595Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3059175Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3059671Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3060138Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3060634Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3061174Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3061606Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3062063Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3062790Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3063241Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3063749Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3064359Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3064915Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3065254Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3065850Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3066381Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3066953Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3067557Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3067968Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3068382Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3068779Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3069362Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3069816Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3070279Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3070778Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3071230Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3071678Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3072100Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3072506Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3072906Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3073640Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3074127Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3074558Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3074946Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3075383Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3075822Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3076309Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3076775Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3077201Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3077657Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3078157Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3078653Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3079154Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.3079574Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3079975Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3080506Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3080901Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3081392Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3081850Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3082394Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.3082889Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3083373Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.3083676Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3085614Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3086149Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3087039Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3087608Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3088363Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3088944Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3089785Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3090645Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3091180Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3092127Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3092446Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3093278Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3093379Z FAILED [0.3342s] [100%]
2025-12-04T10:35:20.3093384Z 
2025-12-04T10:35:20.3093514Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.3093879Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.3093993Z Traceback (most recent call last):
2025-12-04T10:35:20.3094362Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3094582Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3095008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3095239Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3095688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3095858Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3096308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3096484Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3097008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3097344Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3097797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3097941Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3098359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3098466Z     return self._compile_to_module()
2025-12-04T10:35:20.3098892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3099109Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3099618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3099737Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3100212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3100421Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3100937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3101052Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3101501Z   File "/tmp/tmpdpvpnbt1/hi/chi37at57h7wyjtyeit4oefrahv6osfprn2coaj4v5l45t7tvucz.py", line 65, in <module>
2025-12-04T10:35:20.3101910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3102019Z     kernel.precompile(
2025-12-04T10:35:20.3102503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3102612Z     self._precompile_worker()
2025-12-04T10:35:20.3103142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3103303Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3103874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3104057Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3104448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3104669Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3105057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3105349Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3105583Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3106177Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3106264Z ^
2025-12-04T10:35:20.3106666Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3106671Z 
2025-12-04T10:35:20.3107289Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3107356Z 
2025-12-04T10:35:20.3107360Z 
2025-12-04T10:35:20.3107553Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3108624Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.3108725Z 
2025-12-04T10:35:20.3108976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3109175Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3109275Z frames [('total', 1)]
2025-12-04T10:35:20.3109379Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3109793Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3109994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3110088Z graph_break []
2025-12-04T10:35:20.3110500Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.3110620Z Traceback (most recent call last):
2025-12-04T10:35:20.3110988Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3111201Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3111624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3111845Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3112300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3112470Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3112921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3113056Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3113524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3113816Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3114266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3114457Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3114883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3114992Z     return self._compile_to_module()
2025-12-04T10:35:20.3115426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3115598Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3116070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3116196Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3116630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3116839Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3117350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3117464Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3117922Z   File "/tmp/tmp4mwz5oo3/ed/ceddn4j5nx7rvgwuipwrbnpefara2clksjxvaadsdvq7tmyue5xk.py", line 65, in <module>
2025-12-04T10:35:20.3118323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3118526Z     kernel.precompile(
2025-12-04T10:35:20.3119017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3119194Z     self._precompile_worker()
2025-12-04T10:35:20.3119724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3119881Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3120399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3120594Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3120982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3121206Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3121633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3121928Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3122138Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3122705Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3122790Z ^
2025-12-04T10:35:20.3123194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3123199Z 
2025-12-04T10:35:20.3123816Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3123823Z 
2025-12-04T10:35:20.3123827Z 
2025-12-04T10:35:20.3124026Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3124787Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.3124794Z 
2025-12-04T10:35:20.3125036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3125273Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3125368Z frames [('total', 1)]
2025-12-04T10:35:20.3125501Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3125939Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3126139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3126230Z graph_break []
2025-12-04T10:35:20.3126421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3126517Z frames [('total', 1)]
2025-12-04T10:35:20.3126622Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3126815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3127227Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3127313Z graph_break []
2025-12-04T10:35:20.3127451Z =================================== FAILURES ===================================
2025-12-04T10:35:20.3127804Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _
2025-12-04T10:35:20.3127913Z Traceback (most recent call last):
2025-12-04T10:35:20.3128286Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3128489Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3128959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3129184Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3129677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3129857Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3130302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3130430Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3130902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3131184Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3131692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3131863Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3132283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3132400Z     return self._compile_to_module()
2025-12-04T10:35:20.3132823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3132966Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3133424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3133539Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3133973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3134188Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3134695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3134818Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3135272Z   File "/tmp/tmpn3ouw87o/n5/cn5gxgv7yjyaqengtddidqhodyib6kib2wiqnryavjijb7mvuxdj.py", line 65, in <module>
2025-12-04T10:35:20.3135724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3135826Z     kernel.precompile(
2025-12-04T10:35:20.3136307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3136415Z     self._precompile_worker()
2025-12-04T10:35:20.3136934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3137094Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3137615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3137791Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3138186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3138403Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3138786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3139177Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3139380Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3140005Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3140124Z ^
2025-12-04T10:35:20.3140525Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3140530Z 
2025-12-04T10:35:20.3141157Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3141162Z 
2025-12-04T10:35:20.3141166Z 
2025-12-04T10:35:20.3141358Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3142129Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.3142136Z 
2025-12-04T10:35:20.3142432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3142624Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3142727Z frames [('total', 1)]
2025-12-04T10:35:20.3142829Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3143245Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3143446Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3143536Z graph_break []
2025-12-04T10:35:20.3143727Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3143817Z frames [('total', 1)]
2025-12-04T10:35:20.3143919Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3144115Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3144522Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3144619Z graph_break []
2025-12-04T10:35:20.3144806Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3144900Z frames [('total', 1)]
2025-12-04T10:35:20.3145006Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3145197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3145731Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3145825Z graph_break []
2025-12-04T10:35:20.3146392Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml -
2025-12-04T10:35:20.3146551Z =========================== short test summary info ============================
2025-12-04T10:35:20.3147293Z FAILED [0.3342s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3147857Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3147943Z ^
2025-12-04T10:35:20.3148347Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3148351Z 
2025-12-04T10:35:20.3148973Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3148978Z 
2025-12-04T10:35:20.3148981Z 
2025-12-04T10:35:20.3149171Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3149938Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.3149986Z 
2025-12-04T10:35:20.3150224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3150426Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.3150615Z ================== 1 failed, 187 deselected, 2 rerun in 2.48s ==================
2025-12-04T10:35:20.3150705Z Got exit code 1
2025-12-04T10:35:20.3151257Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda
2025-12-04T10:35:20.3151625Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.3152036Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml
2025-12-04T10:35:20.3152233Z ============================= test session starts ==============================
2025-12-04T10:35:20.3152574Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.3152675Z cachedir: .pytest_cache
2025-12-04T10:35:20.3153140Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.3153249Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.3153351Z configfile: pytest.ini
2025-12-04T10:35:20.3153823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.3154024Z collecting ... collected 188 items / 30 deselected / 158 selected
2025-12-04T10:35:20.3154155Z stepcurrent: skipping 30 already run items.
2025-12-04T10:35:20.3154256Z Running 158 items in this shard
2025-12-04T10:35:20.3154260Z 
2025-12-04T10:35:20.3155523Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3156651Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3157024Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3157421Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3157817Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3158291Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3158792Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3159447Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3160103Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3160747Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3161279Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3162176Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3162838Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3163334Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3163897Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3164362Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3164820Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3165314Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3165783Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3166187Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3166629Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3167284Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3167891Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3168484Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3168950Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3169414Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3169805Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3170238Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3170627Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3171057Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3171521Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3171946Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3172408Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3172924Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3173432Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3174000Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3174430Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3174882Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3175384Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3175839Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3176336Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3176848Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3177499Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3178000Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3178454Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3179128Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3179461Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3181768Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3182243Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3183152Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3183706Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3184481Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3185067Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3185885Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3186596Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3187259Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3188337Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3188661Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3189475Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3189593Z ('RERUN', {'yellow': True}) [1.8868s] [  0%]
2025-12-04T10:35:20.3190847Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3191916Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3192296Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3192684Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3193086Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3193551Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3194060Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3194569Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3195075Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3195565Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3195953Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3196498Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3196963Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3197433Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3197939Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3198443Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3198945Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3199368Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3199785Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3200200Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3200704Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3201426Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3202094Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3202700Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3203162Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3203581Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3203980Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3204407Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3204801Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3205226Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3205772Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3206202Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3206653Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3207173Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3207683Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3208511Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3208962Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3209363Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3209869Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3210371Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3210873Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3211410Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3212029Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3212563Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3213122Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3213913Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3214239Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3216653Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3217129Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3218097Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3218648Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3219472Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3220072Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3220832Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3221506Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3222033Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3223109Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3223483Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3224294Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3224420Z ('RERUN', {'yellow': True}) [0.4077s] [  0%]
2025-12-04T10:35:20.3225706Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3226836Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3227210Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3227602Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3228008Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3228469Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3228946Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3229454Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3229968Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3230446Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3230885Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3231437Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3231890Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3232376Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3232876Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3233339Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3233805Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3234229Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3234658Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3235102Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3235561Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3236286Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3236882Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3237484Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3237983Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3238411Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3238806Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3239236Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3239637Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3240053Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3240521Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3240950Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3241400Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3241923Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3242469Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3242963Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3243391Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3243796Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3244300Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3244696Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3245202Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3245692Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3246323Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3246878Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3247367Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3247991Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3248304Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3250623Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3251102Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3252010Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3252558Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3253332Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3253921Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3254728Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3255397Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3255927Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3257006Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3257327Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3258107Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3258199Z FAILED [0.4075s] [  0%]
2025-12-04T10:35:20.3258204Z 
2025-12-04T10:35:20.3258383Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.3258747Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3258861Z Traceback (most recent call last):
2025-12-04T10:35:20.3259326Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3259540Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3259970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3260208Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3260653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3260830Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3261320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3261456Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3261930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3262219Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3262682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3262823Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3263248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3263368Z     return self._compile_to_module()
2025-12-04T10:35:20.3263795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3263953Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3264413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3264533Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3264973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3265221Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3265732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3265855Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3266292Z   File "/tmp/tmprjg0k_q7/wd/cwdgp3iebwu6yvrowg3ani7upfl4zqiwupk36gvhxuvnslp34u2z.py", line 137, in <module>
2025-12-04T10:35:20.3266703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3266812Z     kernel.precompile(
2025-12-04T10:35:20.3267296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3267410Z     self._precompile_worker()
2025-12-04T10:35:20.3267930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3268094Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3268630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3268811Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3269212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3269478Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3269864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3270207Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3270416Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3271127Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3271223Z ^
2025-12-04T10:35:20.3271628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3271633Z 
2025-12-04T10:35:20.3272307Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3272315Z 
2025-12-04T10:35:20.3272319Z 
2025-12-04T10:35:20.3272510Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3273281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3273286Z 
2025-12-04T10:35:20.3273524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3273719Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3273817Z frames [('total', 1)]
2025-12-04T10:35:20.3273922Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3274345Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3274543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3274633Z graph_break []
2025-12-04T10:35:20.3274991Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3275102Z Traceback (most recent call last):
2025-12-04T10:35:20.3275474Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3275716Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3276210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3276440Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3276889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3277059Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3277515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3277644Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3278112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3278400Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3278855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3278995Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3279414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3279525Z     return self._compile_to_module()
2025-12-04T10:35:20.3279956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3280146Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3280602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3280757Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3281188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3281399Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3281907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3282033Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3282469Z   File "/tmp/tmpf736ua_8/4v/c4vspfek4zdn65oysipklcf5zsstvgb4wxbqjpn3wg444jmx3kwc.py", line 137, in <module>
2025-12-04T10:35:20.3282917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3283025Z     kernel.precompile(
2025-12-04T10:35:20.3283507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3283613Z     self._precompile_worker()
2025-12-04T10:35:20.3284145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3284303Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3284836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3285015Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3285405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3285639Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3286137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3286547Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3286817Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3287844Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3287987Z ^
2025-12-04T10:35:20.3295067Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3295082Z 
2025-12-04T10:35:20.3295921Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3295942Z 
2025-12-04T10:35:20.3295947Z 
2025-12-04T10:35:20.3296202Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3296973Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3296979Z 
2025-12-04T10:35:20.3297233Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3297431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3297533Z frames [('total', 1)]
2025-12-04T10:35:20.3297644Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3298062Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3298349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3298435Z graph_break []
2025-12-04T10:35:20.3298632Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3298801Z frames [('total', 1)]
2025-12-04T10:35:20.3298903Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3299167Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3299581Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3299668Z graph_break []
2025-12-04T10:35:20.3299806Z =================================== FAILURES ===================================
2025-12-04T10:35:20.3300162Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3300277Z Traceback (most recent call last):
2025-12-04T10:35:20.3300709Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3300915Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3301361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3301586Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3302036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3302216Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3302661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3302796Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3303269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3303561Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3304023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3304157Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3304575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3304736Z     return self._compile_to_module()
2025-12-04T10:35:20.3305160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3305313Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3305809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3305932Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3306371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3306575Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3307092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3307206Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3307658Z   File "/tmp/tmptwh4ft5l/ws/cwstn6wee6ekvkbrcdhd63wn7ggrie4edg4ixsfv65nh2xrjaqb4.py", line 137, in <module>
2025-12-04T10:35:20.3308337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3308467Z     kernel.precompile(
2025-12-04T10:35:20.3309102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3309333Z     self._precompile_worker()
2025-12-04T10:35:20.3309854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3310075Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3310593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3310767Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3311171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3311385Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3311779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3312131Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3312339Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3313054Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3313134Z ^
2025-12-04T10:35:20.3313536Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3313550Z 
2025-12-04T10:35:20.3314173Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3314178Z 
2025-12-04T10:35:20.3314182Z 
2025-12-04T10:35:20.3314375Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3315152Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3315159Z 
2025-12-04T10:35:20.3315396Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3315627Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3315740Z frames [('total', 1)]
2025-12-04T10:35:20.3315850Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3316326Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3316526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3316614Z graph_break []
2025-12-04T10:35:20.3316818Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3316910Z frames [('total', 1)]
2025-12-04T10:35:20.3317025Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3317217Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3317621Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3317720Z graph_break []
2025-12-04T10:35:20.3317911Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3318001Z frames [('total', 1)]
2025-12-04T10:35:20.3318110Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3318305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3318715Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3318807Z graph_break []
2025-12-04T10:35:20.3319376Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml -
2025-12-04T10:35:20.3319581Z =========================== short test summary info ============================
2025-12-04T10:35:20.3320319Z FAILED [0.4075s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3321071Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3321150Z ^
2025-12-04T10:35:20.3321554Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3321559Z 
2025-12-04T10:35:20.3322182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3322189Z 
2025-12-04T10:35:20.3322193Z 
2025-12-04T10:35:20.3322422Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3323192Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3323200Z 
2025-12-04T10:35:20.3323437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3323609Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.3323786Z ================== 1 failed, 30 deselected, 2 rerun in 2.74s ===================
2025-12-04T10:35:20.3323875Z Got exit code 1
2025-12-04T10:35:20.3323981Z Retrying single test...
2025-12-04T10:35:20.3324395Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml
2025-12-04T10:35:20.3324546Z ============================= test session starts ==============================
2025-12-04T10:35:20.3324869Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.3324970Z cachedir: .pytest_cache
2025-12-04T10:35:20.3325442Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.3325565Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.3325680Z configfile: pytest.ini
2025-12-04T10:35:20.3326236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.3326437Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.3327128Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3327240Z Running 1 items in this shard
2025-12-04T10:35:20.3327248Z 
2025-12-04T10:35:20.3328497Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3329588Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3329962Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3330364Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3330810Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3331276Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3331799Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3332311Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3332830Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3333312Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3333759Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3334315Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3334776Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3335263Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3335770Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3336239Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3336704Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3337128Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3337554Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3338005Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3338447Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3339154Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3339755Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3340351Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3340808Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3341240Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3341635Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3342062Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3342534Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3342949Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3343459Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3343884Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3344346Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3344859Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3345406Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3345946Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3346378Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3346784Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3347341Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3347890Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3348519Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3349135Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3349817Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3350479Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3351063Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3351710Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3352034Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3354292Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3354813Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3355770Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3356356Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3357134Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3357723Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3358533Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3359203Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3359733Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3360819Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3361137Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3361918Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3362039Z ('RERUN', {'yellow': True}) [1.8746s] [100%]
2025-12-04T10:35:20.3363332Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3364411Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3364792Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3365181Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3365582Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3366064Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3366535Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3367046Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3367593Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3368073Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3368509Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3369062Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3369525Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3370001Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3370546Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3371014Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3371473Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3371909Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3372329Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3372737Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3373182Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3373837Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3374447Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3375091Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3375567Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3376031Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3376432Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3376871Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3377268Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3377704Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3378173Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3378601Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3379248Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3379765Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3380318Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3380809Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3381241Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3381648Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3382189Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3382598Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3383100Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3383576Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3384186Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3384687Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3385147Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3385782Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3386132Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3388446Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3388932Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3389834Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3390390Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3391170Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3391798Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3392603Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3393276Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3393818Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3394940Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3395269Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3396043Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3396162Z ('RERUN', {'yellow': True}) [0.4061s] [100%]
2025-12-04T10:35:20.3397415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3398487Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3398871Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3399302Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3399710Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3400175Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3400653Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3401174Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3401679Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3402166Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3402552Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3403107Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3403607Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3404084Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3404628Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3405086Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3405552Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3406024Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3406481Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3406893Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3407327Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3408862Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3409460Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3410185Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3410787Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3411255Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3411652Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3412226Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3412750Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3413241Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3413740Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3414299Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3414852Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3415377Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3415927Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3416411Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3416935Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3417334Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3417893Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3418291Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3418790Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3419318Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3419985Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3420493Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3420939Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3421556Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3421869Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3424161Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3424642Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3425544Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3426104Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3426875Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3427478Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3428243Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3428922Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3429492Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3430623Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3430943Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3431713Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3431827Z FAILED [0.4099s] [100%]
2025-12-04T10:35:20.3431901Z 
2025-12-04T10:35:20.3432034Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.3432398Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3432513Z Traceback (most recent call last):
2025-12-04T10:35:20.3432882Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3433106Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3433531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3433755Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3434216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3434393Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3434843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3434983Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3435458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3435796Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3436295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3436436Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3436858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3436969Z     return self._compile_to_module()
2025-12-04T10:35:20.3437409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3437559Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3438020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3438140Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3438576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3438790Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3439300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3439416Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3439884Z   File "/tmp/tmp9h6j786w/qu/cquawwa4gvnmawtegqtb2rddoexidaw7vi3dinwdgotdx3la65zw.py", line 137, in <module>
2025-12-04T10:35:20.3440329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3440475Z     kernel.precompile(
2025-12-04T10:35:20.3440955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3441062Z     self._precompile_worker()
2025-12-04T10:35:20.3441593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3441752Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3442280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3442455Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3442887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3443110Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3443504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3443798Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3444008Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3444715Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3444803Z ^
2025-12-04T10:35:20.3445204Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3445211Z 
2025-12-04T10:35:20.3445888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3445902Z 
2025-12-04T10:35:20.3445907Z 
2025-12-04T10:35:20.3446099Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3446913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3446918Z 
2025-12-04T10:35:20.3447164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3447357Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3447460Z frames [('total', 1)]
2025-12-04T10:35:20.3447563Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3447975Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3448184Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3448272Z graph_break []
2025-12-04T10:35:20.3448624Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3448742Z Traceback (most recent call last):
2025-12-04T10:35:20.3449112Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3449325Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3449750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3449970Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3450420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3450635Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3451082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3451262Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3451729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3452022Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3452473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3452603Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3453029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3453144Z     return self._compile_to_module()
2025-12-04T10:35:20.3453613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3453758Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3454208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3454330Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3454761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3454969Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3455478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3455595Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3456074Z   File "/tmp/tmptwfxiuzo/ql/cqlbwfpkcdzcclgwbwdzgvro532w3bgf2ppp6rnm3ybjssmmbl5x.py", line 137, in <module>
2025-12-04T10:35:20.3456482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3456584Z     kernel.precompile(
2025-12-04T10:35:20.3457076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3457180Z     self._precompile_worker()
2025-12-04T10:35:20.3457750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3457907Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3458426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3458611Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3459004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3459269Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3459657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3459951Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3460158Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3460862Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3460946Z ^
2025-12-04T10:35:20.3461354Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3461404Z 
2025-12-04T10:35:20.3462027Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3462070Z 
2025-12-04T10:35:20.3462074Z 
2025-12-04T10:35:20.3462271Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3463031Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3463036Z 
2025-12-04T10:35:20.3463283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3463475Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3463567Z frames [('total', 1)]
2025-12-04T10:35:20.3463680Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3464130Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3464341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3464434Z graph_break []
2025-12-04T10:35:20.3464624Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3464733Z frames [('total', 1)]
2025-12-04T10:35:20.3464839Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3465036Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3465447Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3465536Z graph_break []
2025-12-04T10:35:20.3465669Z =================================== FAILURES ===================================
2025-12-04T10:35:20.3466035Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3466149Z Traceback (most recent call last):
2025-12-04T10:35:20.3466523Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3466727Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3467155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3467382Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3467875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3468050Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3468491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3468619Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3469092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3469418Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3470017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3470205Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3470803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3470943Z     return self._compile_to_module()
2025-12-04T10:35:20.3471424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3471619Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3472194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3472407Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3472976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3473296Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3473948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3474125Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3474688Z   File "/tmp/tmpqq4fse1_/7m/c7mezmlt7pzyraubputsbizgi6je765fehqvh2onofegcssc3wez.py", line 137, in <module>
2025-12-04T10:35:20.3475216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3475367Z     kernel.precompile(
2025-12-04T10:35:20.3476103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3476265Z     self._precompile_worker()
2025-12-04T10:35:20.3476988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3477195Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3477941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3478197Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3478771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3479082Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3479625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3480068Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3480338Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3481212Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3481294Z ^
2025-12-04T10:35:20.3481768Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3481774Z 
2025-12-04T10:35:20.3482400Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3482405Z 
2025-12-04T10:35:20.3482409Z 
2025-12-04T10:35:20.3482602Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3483368Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3483375Z 
2025-12-04T10:35:20.3483609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3483801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3483897Z frames [('total', 1)]
2025-12-04T10:35:20.3484001Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3484415Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3484613Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3484699Z graph_break []
2025-12-04T10:35:20.3484891Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3485149Z frames [('total', 1)]
2025-12-04T10:35:20.3485255Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3485456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3485905Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3486001Z graph_break []
2025-12-04T10:35:20.3486187Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3486281Z frames [('total', 1)]
2025-12-04T10:35:20.3486389Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3486581Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3486983Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3487082Z graph_break []
2025-12-04T10:35:20.3487696Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml -
2025-12-04T10:35:20.3487859Z =========================== short test summary info ============================
2025-12-04T10:35:20.3488598Z FAILED [0.4099s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3489300Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3489386Z ^
2025-12-04T10:35:20.3489787Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3489792Z 
2025-12-04T10:35:20.3490425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3490431Z 
2025-12-04T10:35:20.3490435Z 
2025-12-04T10:35:20.3490626Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3491395Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3491400Z 
2025-12-04T10:35:20.3491678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3491839Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.3492026Z ================== 1 failed, 187 deselected, 2 rerun in 2.73s ==================
2025-12-04T10:35:20.3492114Z Got exit code 1
2025-12-04T10:35:20.3492211Z Retrying single test...
2025-12-04T10:35:20.3492637Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml
2025-12-04T10:35:20.3492788Z ============================= test session starts ==============================
2025-12-04T10:35:20.3493103Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.3493202Z cachedir: .pytest_cache
2025-12-04T10:35:20.3493663Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.3493780Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.3493882Z configfile: pytest.ini
2025-12-04T10:35:20.3494354Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.3494563Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.3495252Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3495408Z Running 1 items in this shard
2025-12-04T10:35:20.3495413Z 
2025-12-04T10:35:20.3496718Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3497845Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3498217Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3498650Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3499114Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3499582Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3500057Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3500562Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3501079Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3501562Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3501954Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3502515Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3503015Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3503494Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3503994Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3504456Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3504927Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3505358Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3505829Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3506231Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3506663Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3507327Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3508213Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3508896Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3509354Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3509782Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3510171Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3510659Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3511055Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3511476Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3511944Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3512372Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3512826Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3513348Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3513854Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3514349Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3514835Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3515242Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3515746Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3516144Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3516650Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3517120Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3517733Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3518239Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3518683Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3519303Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3519705Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3522000Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3522526Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3523449Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3523994Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3524769Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3525355Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3526168Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3526839Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3527412Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3528498Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3528822Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3529599Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3529721Z ('RERUN', {'yellow': True}) [1.8979s] [100%]
2025-12-04T10:35:20.3530971Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3532137Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3532682Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3533297Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3533762Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3534405Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3535022Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3535761Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3536442Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3537105Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3537553Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3538216Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3538811Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3539557Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3540207Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3540828Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3541511Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3542077Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3542669Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3543215Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3543799Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3544706Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3545475Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3546300Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3546946Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3547661Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3548202Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3548906Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3549459Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3550015Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3550608Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3551126Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3551587Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3552094Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3552590Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3553074Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3553499Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3553896Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3554392Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3554787Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3555274Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3555824Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3556425Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3556914Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3557360Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3557957Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3558264Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3560503Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3561041Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3561934Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3562464Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3563268Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3563846Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3564605Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3565255Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3565828Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3566897Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3567205Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3568117Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3568228Z ('RERUN', {'yellow': True}) [0.4056s] [100%]
2025-12-04T10:35:20.3569468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.3570526Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3570892Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.3571272Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.3571663Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3572119Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3572619Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3573156Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3573652Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3574124Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3574504Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3575076Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3575532Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.3576044Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.3576543Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3576988Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3577538Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3577962Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3578370Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3578772Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.3579270Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.3579961Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3580551Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3581137Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.3581587Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3581998Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.3582391Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.3582804Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.3583183Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.3583597Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.3584093Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.3584577Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.3585018Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.3585544Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3586073Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.3586593Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.3587026Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.3587425Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.3587913Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.3588302Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.3588787Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.3589247Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.3589850Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.3590347Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.3590827Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.3591426Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.3591735Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3593975Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3594441Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3595336Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3595920Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3596723Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3597302Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3598055Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3598759Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3599285Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3600356Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3600675Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3601440Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3601539Z FAILED [0.4046s] [100%]
2025-12-04T10:35:20.3601544Z 
2025-12-04T10:35:20.3601664Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.3602007Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3602119Z Traceback (most recent call last):
2025-12-04T10:35:20.3602516Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3602725Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3603142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3603350Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3603793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3603956Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3604386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3604520Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3604975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3605254Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3605699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3605820Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3606236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3606385Z     return self._compile_to_module()
2025-12-04T10:35:20.3606805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3606980Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3607414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3607528Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3608226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3608426Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3608938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3609049Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3609592Z   File "/tmp/tmphfo6cmb_/bz/cbz4aj6wg7oljizcrxvnda3ihrmadpwgczxt5ktckd5lv6bdm6rc.py", line 137, in <module>
2025-12-04T10:35:20.3609987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3610077Z     kernel.precompile(
2025-12-04T10:35:20.3610553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3610649Z     self._precompile_worker()
2025-12-04T10:35:20.3611162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3611315Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3611827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3612007Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3612392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3612607Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3612979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3613338Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3613536Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3614226Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3614296Z ^
2025-12-04T10:35:20.3614698Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3614705Z 
2025-12-04T10:35:20.3615312Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3615318Z 
2025-12-04T10:35:20.3615322Z 
2025-12-04T10:35:20.3615513Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3616262Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3616267Z 
2025-12-04T10:35:20.3616502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3616684Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3616766Z frames [('total', 1)]
2025-12-04T10:35:20.3616926Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3617332Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3617532Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3617669Z graph_break []
2025-12-04T10:35:20.3618010Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3618114Z Traceback (most recent call last):
2025-12-04T10:35:20.3618474Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3618668Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3619129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3619345Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3620455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3620623Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3626234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3626384Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3626854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3627134Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3627589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3627717Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3628140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3628245Z     return self._compile_to_module()
2025-12-04T10:35:20.3628662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3628810Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3629253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3629438Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3629860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3630058Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3630571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3630681Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3631124Z   File "/tmp/tmprzfkhit7/bc/cbcqw6tefexgqfmhlfmsm35v27raw3lgjizd6ai4q4vwem62jst7.py", line 137, in <module>
2025-12-04T10:35:20.3631527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3631617Z     kernel.precompile(
2025-12-04T10:35:20.3632111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3632215Z     self._precompile_worker()
2025-12-04T10:35:20.3632725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3632884Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3633390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3633611Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3633994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3634241Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3634622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3634916Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3635111Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3635812Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3635891Z ^
2025-12-04T10:35:20.3636332Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3636339Z 
2025-12-04T10:35:20.3636945Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3636952Z 
2025-12-04T10:35:20.3636956Z 
2025-12-04T10:35:20.3637148Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3637901Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3637907Z 
2025-12-04T10:35:20.3638132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3638330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3638420Z frames [('total', 1)]
2025-12-04T10:35:20.3638527Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3638931Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3639120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3639210Z graph_break []
2025-12-04T10:35:20.3639395Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3639478Z frames [('total', 1)]
2025-12-04T10:35:20.3639628Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3639812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3640208Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3640292Z graph_break []
2025-12-04T10:35:20.3640413Z =================================== FAILURES ===================================
2025-12-04T10:35:20.3640766Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _
2025-12-04T10:35:20.3640870Z Traceback (most recent call last):
2025-12-04T10:35:20.3641229Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3641435Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3641849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3642070Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3642506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3642669Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3643113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3643281Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3643740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3644051Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3644491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3644624Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3645031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3645131Z     return self._compile_to_module()
2025-12-04T10:35:20.3645551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3645689Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3646176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3646285Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3646711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3646911Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3647411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3647524Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3647945Z   File "/tmp/tmp9qv8_m5o/ms/cmsnsj7uefdv2k4uimmgbctlqtbmhqvsjbc764nvmryqbe73lbvq.py", line 137, in <module>
2025-12-04T10:35:20.3648339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3648443Z     kernel.precompile(
2025-12-04T10:35:20.3648917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3649017Z     self._precompile_worker()
2025-12-04T10:35:20.3649531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3649682Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3650243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3650410Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3650791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3651008Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3651385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3651675Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3651870Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3652563Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3652648Z ^
2025-12-04T10:35:20.3653036Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3653042Z 
2025-12-04T10:35:20.3653656Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3653704Z 
2025-12-04T10:35:20.3653709Z 
2025-12-04T10:35:20.3653894Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3654639Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3654718Z 
2025-12-04T10:35:20.3654947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3655132Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3655226Z frames [('total', 1)]
2025-12-04T10:35:20.3655326Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3655724Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3655919Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3656004Z graph_break []
2025-12-04T10:35:20.3656237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3656324Z frames [('total', 1)]
2025-12-04T10:35:20.3656420Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3656616Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3657013Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3657098Z graph_break []
2025-12-04T10:35:20.3657290Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3657375Z frames [('total', 1)]
2025-12-04T10:35:20.3657470Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3657665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3658062Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.3658154Z graph_break []
2025-12-04T10:35:20.3658717Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml -
2025-12-04T10:35:20.3658866Z =========================== short test summary info ============================
2025-12-04T10:35:20.3659673Z FAILED [0.4046s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3660407Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3660489Z ^
2025-12-04T10:35:20.3660881Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3660889Z 
2025-12-04T10:35:20.3661495Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3661507Z 
2025-12-04T10:35:20.3661513Z 
2025-12-04T10:35:20.3661697Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3662443Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3662448Z 
2025-12-04T10:35:20.3662682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3662834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.3663011Z ================== 1 failed, 187 deselected, 2 rerun in 2.74s ==================
2025-12-04T10:35:20.3663098Z Got exit code 1
2025-12-04T10:35:20.3663680Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda
2025-12-04T10:35:20.3664045Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.3664494Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml
2025-12-04T10:35:20.3664628Z ============================= test session starts ==============================
2025-12-04T10:35:20.3664933Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.3665025Z cachedir: .pytest_cache
2025-12-04T10:35:20.3665477Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.3665580Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.3665670Z configfile: pytest.ini
2025-12-04T10:35:20.3666183Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.3666380Z collecting ... collected 188 items / 31 deselected / 157 selected
2025-12-04T10:35:20.3666504Z stepcurrent: skipping 31 already run items.
2025-12-04T10:35:20.3666609Z Running 157 items in this shard
2025-12-04T10:35:20.3666614Z 
2025-12-04T10:35:20.3667789Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3668722Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3669099Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.3669485Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3669874Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3670377Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3670844Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3671337Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3671837Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3672309Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3672686Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3673055Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3673559Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3674066Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3674591Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3675136Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3675633Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3676091Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3676519Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3676922Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3677334Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3678048Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3678498Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3679007Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3679621Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3680158Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3680504Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3681040Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3681538Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3682128Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3682753Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3683166Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3683598Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3684011Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3684552Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3685021Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3685488Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3685986Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3686479Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3686969Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3687392Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3687804Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3688216Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3688926Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3689391Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3689825Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3690214Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3690650Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3691035Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3691474Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3691948Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3692375Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3692836Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3693381Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3693895Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3694368Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.3694799Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3695207Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3695725Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3696165Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3696657Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3697126Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3697640Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.3698197Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3698707Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.3699017Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3701135Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3701595Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3702497Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3703032Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3703793Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3704386Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3705144Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3705853Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3706376Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3707329Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3707638Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3708657Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3708772Z ('RERUN', {'yellow': True}) [1.7831s] [  0%]
2025-12-04T10:35:20.3709938Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3710955Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3711393Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.3711785Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3712180Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3712643Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3713190Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3713685Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3714193Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3714667Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3715067Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3715452Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3715980Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3716481Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3716995Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3717550Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3718003Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3718450Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3718872Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3719286Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3719698Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3720355Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3720796Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3721299Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3721907Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3722467Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3722845Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3723375Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3723871Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3724418Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3725062Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3725474Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3725887Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3726289Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3726823Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3727275Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3727748Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3728250Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3728701Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3729192Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3729618Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3730018Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3730421Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3731084Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3731540Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3731959Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3732349Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3732777Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3733208Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3733636Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3734131Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3734549Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3735006Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3735511Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3736101Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3736584Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.3737006Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3737407Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3737894Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3738285Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3738776Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3739287Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3739800Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.3740360Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3740835Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.3741136Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3743150Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3743608Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3744508Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3745093Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3745936Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3746527Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3747279Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3748068Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3748590Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3749528Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3749840Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3750609Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3750722Z ('RERUN', {'yellow': True}) [0.3392s] [  0%]
2025-12-04T10:35:20.3751886Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3752989Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3753370Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.3753760Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3754145Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3754606Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3755071Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3755576Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3756122Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3756590Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3756972Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3757380Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3757882Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3758422Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3758934Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3759424Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3759870Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3760355Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3760776Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3761181Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3761577Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3762229Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3762669Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3763173Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3763784Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3764340Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3764677Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3765197Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3765744Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3766289Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3766983Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3767394Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3767802Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3768199Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3768787Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3769240Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3769742Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3770242Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3770687Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3771137Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3771592Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3771999Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3772401Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3773060Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3773514Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3773932Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3774321Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3774746Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3775133Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3775626Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3776108Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3776526Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3776972Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3777478Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3777979Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3778452Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.3778870Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3779333Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3779825Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3780260Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3780782Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3781236Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3781746Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.3782237Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3782768Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.3783079Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3785083Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3785564Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3786487Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3787016Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3787818Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3788397Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3789145Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3789808Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3790329Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3791259Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3791565Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3792372Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3792496Z FAILED [0.3384s] [  0%]
2025-12-04T10:35:20.3792501Z 
2025-12-04T10:35:20.3792619Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.3792969Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.3793076Z Traceback (most recent call last):
2025-12-04T10:35:20.3793443Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3793642Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3794058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3794319Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3794761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3794926Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3795362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3795483Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3795946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3796222Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3796665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3796799Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3797209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3797316Z     return self._compile_to_module()
2025-12-04T10:35:20.3797733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3797870Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3798371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3798479Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3798899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3799099Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3799598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3799714Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3800137Z   File "/tmp/tmpzjmaw_kz/yw/cyw64nfiorcf2siwfkmktiivuijku7y4kmp6tsf54uxbkikimb66.py", line 65, in <module>
2025-12-04T10:35:20.3800539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3800644Z     kernel.precompile(
2025-12-04T10:35:20.3801118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3801218Z     self._precompile_worker()
2025-12-04T10:35:20.3801722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3801869Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3802429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3802598Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3803021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3803234Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3803608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3803899Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3804091Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3804641Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3804723Z ^
2025-12-04T10:35:20.3805156Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3805162Z 
2025-12-04T10:35:20.3805812Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3805818Z 
2025-12-04T10:35:20.3805823Z 
2025-12-04T10:35:20.3806018Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3806789Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.3806794Z 
2025-12-04T10:35:20.3807023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3807205Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3807302Z frames [('total', 1)]
2025-12-04T10:35:20.3807397Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3808036Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3808233Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3808313Z graph_break []
2025-12-04T10:35:20.3808663Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.3808843Z Traceback (most recent call last):
2025-12-04T10:35:20.3809204Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3809400Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3809817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3810032Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3810473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3810638Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3811078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3811196Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3811650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3811926Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3812368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3812556Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3812965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3813061Z     return self._compile_to_module()
2025-12-04T10:35:20.3813532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3813666Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3814103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3814215Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3814631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3814834Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3815388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3815498Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3815979Z   File "/tmp/tmpjujnynkf/tu/ctuoy5iuboo2w6ka63qwlilernpnne76wje6f3sicc5s5ry6t4rs.py", line 65, in <module>
2025-12-04T10:35:20.3816379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3816476Z     kernel.precompile(
2025-12-04T10:35:20.3816954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3817056Z     self._precompile_worker()
2025-12-04T10:35:20.3817573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3817718Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3818246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3818416Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3818796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3819011Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3819478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3819762Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3819966Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3820523Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3820598Z ^
2025-12-04T10:35:20.3820992Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3820997Z 
2025-12-04T10:35:20.3821605Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3821618Z 
2025-12-04T10:35:20.3821622Z 
2025-12-04T10:35:20.3821806Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3822568Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.3822573Z 
2025-12-04T10:35:20.3822807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3822990Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3823130Z frames [('total', 1)]
2025-12-04T10:35:20.3823232Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3823636Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3823904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3823983Z graph_break []
2025-12-04T10:35:20.3824164Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3824263Z frames [('total', 1)]
2025-12-04T10:35:20.3824366Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3824556Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3824959Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3825040Z graph_break []
2025-12-04T10:35:20.3825176Z =================================== FAILURES ===================================
2025-12-04T10:35:20.3825594Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.3825714Z Traceback (most recent call last):
2025-12-04T10:35:20.3826111Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3826306Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3826737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3826947Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3827385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3827565Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3828002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3828124Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3828591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3828867Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3829364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3829491Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3829903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3830017Z     return self._compile_to_module()
2025-12-04T10:35:20.3830435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3830588Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3831035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3831142Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3831572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3831769Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3832271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3832379Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3832823Z   File "/tmp/tmpgsikz3dy/jp/cjptyxbztdc4hx6s5p4yoya4vwahzgybnb33pe44qoussrewtnpv.py", line 65, in <module>
2025-12-04T10:35:20.3833273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3833366Z     kernel.precompile(
2025-12-04T10:35:20.3833841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3833981Z     self._precompile_worker()
2025-12-04T10:35:20.3834487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3834646Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3835155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3835323Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3835763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3836012Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3836396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3836698Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3836893Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3837460Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3837531Z ^
2025-12-04T10:35:20.3837928Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3837933Z 
2025-12-04T10:35:20.3838546Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3838556Z 
2025-12-04T10:35:20.3838560Z 
2025-12-04T10:35:20.3838749Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3839513Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.3839521Z 
2025-12-04T10:35:20.3839750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3839990Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3840079Z frames [('total', 1)]
2025-12-04T10:35:20.3840175Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3840598Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3840792Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3840876Z graph_break []
2025-12-04T10:35:20.3841068Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3841155Z frames [('total', 1)]
2025-12-04T10:35:20.3841262Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3841444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3841846Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3841934Z graph_break []
2025-12-04T10:35:20.3842121Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3842209Z frames [('total', 1)]
2025-12-04T10:35:20.3842313Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3842502Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3842897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3843027Z graph_break []
2025-12-04T10:35:20.3843581Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml -
2025-12-04T10:35:20.3843775Z =========================== short test summary info ============================
2025-12-04T10:35:20.3844524Z FAILED [0.3384s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3845087Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3845163Z ^
2025-12-04T10:35:20.3845558Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3845567Z 
2025-12-04T10:35:20.3846229Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3846235Z 
2025-12-04T10:35:20.3846243Z 
2025-12-04T10:35:20.3846434Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3847204Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.3847211Z 
2025-12-04T10:35:20.3847434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3847582Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.3847767Z ================== 1 failed, 31 deselected, 2 rerun in 2.49s ===================
2025-12-04T10:35:20.3847854Z Got exit code 1
2025-12-04T10:35:20.3847956Z Retrying single test...
2025-12-04T10:35:20.3848364Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml
2025-12-04T10:35:20.3848500Z ============================= test session starts ==============================
2025-12-04T10:35:20.3848809Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.3848902Z cachedir: .pytest_cache
2025-12-04T10:35:20.3849392Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.3849503Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.3849593Z configfile: pytest.ini
2025-12-04T10:35:20.3850064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.3850253Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.3850949Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.3851053Z Running 1 items in this shard
2025-12-04T10:35:20.3851059Z 
2025-12-04T10:35:20.3852226Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3853168Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3853550Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.3853996Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3854384Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3854880Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3855351Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3855899Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3856410Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3856918Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3857294Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3857664Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3858249Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3858757Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3859302Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3859792Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3860259Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3860704Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3861183Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3861593Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3861984Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3862654Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3863109Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3863615Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3864228Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3864754Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3865092Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3865699Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3866277Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3866828Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3867429Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3867838Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3868277Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3868683Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3869221Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3869684Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3870155Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3870659Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3871107Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3871556Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3872059Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3872591Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3873173Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3873837Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3874286Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3874710Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3875100Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3875529Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3875913Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3876329Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3876834Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3877303Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3877748Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3878288Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3878780Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3879257Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.3879673Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3880107Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3880593Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3880981Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3881467Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3881920Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3882422Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.3882913Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3883379Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.3883681Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3885817Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3886273Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3887169Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3887705Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3888458Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3889084Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3889868Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3890525Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3891043Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3892012Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3892320Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3893079Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3893193Z ('RERUN', {'yellow': True}) [1.8022s] [100%]
2025-12-04T10:35:20.3894357Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3895282Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3895672Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.3896092Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3896517Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3896965Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3897425Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3897917Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3898412Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3898879Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3899315Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3899680Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3900181Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3900684Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3901243Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3901768Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3902257Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3902704Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3903123Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3903565Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3903970Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3904628Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3905072Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3905578Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3906180Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3906707Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3907047Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3907565Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3908391Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3908944Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3909544Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3909948Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3910357Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3910750Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3911284Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3911735Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3912196Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3912747Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3913249Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3913695Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3914113Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3914511Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3914902Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3915638Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3916113Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3916539Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3916928Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3917361Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3917742Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3918168Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3918628Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3919051Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3919541Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3920043Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3920536Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3921013Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.3921429Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3921825Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3922309Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3922697Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3923184Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3923689Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3924200Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.3924754Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3925219Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.3925549Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3927623Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3928083Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3928978Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3929513Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3930268Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3936275Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3937135Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3937802Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3938332Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3939326Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3939644Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3940406Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3940526Z ('RERUN', {'yellow': True}) [0.3409s] [100%]
2025-12-04T10:35:20.3941696Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.3942712Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3943087Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.3943479Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.3943863Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.3944359Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.3944829Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.3945320Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.3945878Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.3946345Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.3946724Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.3947099Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.3947603Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3948109Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3948668Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.3949159Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3949614Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3950067Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3950490Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3950896Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3951291Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3951947Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3952396Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.3952903Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3953553Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.3954107Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.3954451Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.3954968Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.3955470Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.3956059Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.3956660Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.3957066Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.3957468Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.3957862Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.3958397Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.3958854Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.3959316Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.3959811Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.3960306Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.3960751Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.3961168Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.3961579Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.3961977Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.3962637Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.3963087Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.3963515Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.3963901Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.3964375Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.3964762Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.3965221Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.3965682Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.3966152Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.3966599Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.3967142Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.3967646Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.3968119Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.3968546Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.3968947Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.3969434Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.3969835Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.3970321Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.3970777Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.3971328Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.3971816Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.3972286Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.3972593Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.3974611Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.3975065Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.3976051Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3976644Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3977402Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3977990Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3978779Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3979519Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3980044Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.3980980Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3981290Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.3982050Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3982143Z FAILED [0.3398s] [100%]
2025-12-04T10:35:20.3982149Z 
2025-12-04T10:35:20.3982268Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.3982623Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.3982736Z Traceback (most recent call last):
2025-12-04T10:35:20.3983164Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3983371Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3983788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3984014Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.3984459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.3984623Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.3985070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.3985195Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.3985675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.3985987Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.3986430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.3986566Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.3986973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.3987117Z     return self._compile_to_module()
2025-12-04T10:35:20.3987537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.3987721Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.3988171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.3988281Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.3988701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.3988908Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.3989410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.3989561Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.3990015Z   File "/tmp/tmppumwox5z/ga/cgaga4fcmswxmfr4dvripvwppumijpm34xl47zyazd2lbidr63sr.py", line 65, in <module>
2025-12-04T10:35:20.3990414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.3990521Z     kernel.precompile(
2025-12-04T10:35:20.3990998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.3991098Z     self._precompile_worker()
2025-12-04T10:35:20.3991620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.3991775Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.3992289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.3992463Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.3992845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.3993064Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.3993441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.3993778Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.3993974Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.3994525Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.3994605Z ^
2025-12-04T10:35:20.3995002Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.3995012Z 
2025-12-04T10:35:20.3995616Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.3995630Z 
2025-12-04T10:35:20.3995634Z 
2025-12-04T10:35:20.3995817Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.3996585Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.3996590Z 
2025-12-04T10:35:20.3996821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.3997003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.3997094Z frames [('total', 1)]
2025-12-04T10:35:20.3997234Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.3997638Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.3997829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.3997951Z graph_break []
2025-12-04T10:35:20.3998298Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.3998404Z Traceback (most recent call last):
2025-12-04T10:35:20.3998768Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.3998967Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.3999381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.3999593Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4000078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4000240Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4000680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4000800Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4001255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4001535Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4001986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4002108Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4002526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4002629Z     return self._compile_to_module()
2025-12-04T10:35:20.4003062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4003207Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4003653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4003817Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4004242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4004455Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4004962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4005073Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4005527Z   File "/tmp/tmpprtorvna/bf/cbfnpwqlaszmm75ijj7mv2mu6lpsnqq6wq6dnli45nfcnss3ezsk.py", line 65, in <module>
2025-12-04T10:35:20.4005970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4006077Z     kernel.precompile(
2025-12-04T10:35:20.4006564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4006668Z     self._precompile_worker()
2025-12-04T10:35:20.4007186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4007339Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4008127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4008389Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4008773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4009043Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4009423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4009708Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4009915Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4010472Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4010546Z ^
2025-12-04T10:35:20.4010994Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4011002Z 
2025-12-04T10:35:20.4011607Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4011615Z 
2025-12-04T10:35:20.4011620Z 
2025-12-04T10:35:20.4011808Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4012570Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4012575Z 
2025-12-04T10:35:20.4012809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4012992Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4013079Z frames [('total', 1)]
2025-12-04T10:35:20.4013189Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4013597Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4013796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4013881Z graph_break []
2025-12-04T10:35:20.4014065Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4014161Z frames [('total', 1)]
2025-12-04T10:35:20.4014257Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4014507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4014909Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4014992Z graph_break []
2025-12-04T10:35:20.4015117Z =================================== FAILURES ===================================
2025-12-04T10:35:20.4015478Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.4015582Z Traceback (most recent call last):
2025-12-04T10:35:20.4015999Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4016199Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4016610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4016829Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4017267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4017441Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4017879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4018073Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4018537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4018849Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4019344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4019470Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4019879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4019985Z     return self._compile_to_module()
2025-12-04T10:35:20.4020397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4020533Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4021021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4021130Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4021558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4021754Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4022251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4022364Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4022802Z   File "/tmp/tmpj0g4eowh/ms/cmsbtdxe5kc65vlifejvtsqxlhqyiibj4nnpcuvakd7bzcw4xh6y.py", line 65, in <module>
2025-12-04T10:35:20.4023201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4023293Z     kernel.precompile(
2025-12-04T10:35:20.4023764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4023862Z     self._precompile_worker()
2025-12-04T10:35:20.4024368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4024516Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4025075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4025241Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4025653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4025908Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4026399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4026698Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4026892Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4027459Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4027529Z ^
2025-12-04T10:35:20.4027921Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4027927Z 
2025-12-04T10:35:20.4028540Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4028545Z 
2025-12-04T10:35:20.4028606Z 
2025-12-04T10:35:20.4028788Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4029554Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4029602Z 
2025-12-04T10:35:20.4029828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4030010Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4030103Z frames [('total', 1)]
2025-12-04T10:35:20.4030198Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4030604Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4030787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4030868Z graph_break []
2025-12-04T10:35:20.4031054Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4031178Z frames [('total', 1)]
2025-12-04T10:35:20.4031274Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4031459Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4031856Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4031944Z graph_break []
2025-12-04T10:35:20.4032122Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4032204Z frames [('total', 1)]
2025-12-04T10:35:20.4032302Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4032483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4032877Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4032965Z graph_break []
2025-12-04T10:35:20.4033526Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml -
2025-12-04T10:35:20.4033675Z =========================== short test summary info ============================
2025-12-04T10:35:20.4034415Z FAILED [0.3398s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4035009Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4035088Z ^
2025-12-04T10:35:20.4035478Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4035482Z 
2025-12-04T10:35:20.4036096Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4036104Z 
2025-12-04T10:35:20.4036108Z 
2025-12-04T10:35:20.4036290Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4037046Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4037056Z 
2025-12-04T10:35:20.4037282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4037437Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.4037619Z ================== 1 failed, 187 deselected, 2 rerun in 2.52s ==================
2025-12-04T10:35:20.4037699Z Got exit code 1
2025-12-04T10:35:20.4037790Z Retrying single test...
2025-12-04T10:35:20.4038202Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml
2025-12-04T10:35:20.4038388Z ============================= test session starts ==============================
2025-12-04T10:35:20.4038689Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.4038818Z cachedir: .pytest_cache
2025-12-04T10:35:20.4039263Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.4039373Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.4039468Z configfile: pytest.ini
2025-12-04T10:35:20.4040039Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.4040242Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.4040932Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4041083Z Running 1 items in this shard
2025-12-04T10:35:20.4041088Z 
2025-12-04T10:35:20.4042253Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.4043198Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4043579Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.4043964Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.4044367Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4044819Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4045292Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4045828Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4046328Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4046807Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4047192Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.4047567Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.4048076Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4048583Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4049095Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4049582Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.4050089Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.4050578Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4051001Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.4051413Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.4051809Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.4052482Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.4052974Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4053487Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4054098Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.4054604Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.4054948Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.4055467Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.4055977Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.4056523Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.4057191Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.4057599Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.4058000Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.4058413Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.4058950Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.4059461Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4059934Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.4060423Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.4060885Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.4061379Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4061797Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.4062238Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.4062630Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.4063293Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.4063741Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4064207Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.4064592Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.4065025Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.4065421Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.4065851Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.4066315Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.4066743Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.4067204Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.4067715Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4068249Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.4068742Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.4069169Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.4069572Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.4070065Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.4070455Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.4070953Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.4071414Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.4071927Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.4072419Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.4072928Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.4073292Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4075340Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4075811Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4076708Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4077255Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4078013Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4078609Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4079366Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4080043Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4080610Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4081544Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4081871Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4082635Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4082758Z ('RERUN', {'yellow': True}) [1.8046s] [100%]
2025-12-04T10:35:20.4083923Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.4084863Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4085279Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.4085748Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.4086153Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4086607Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4087072Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4087565Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4088096Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4088585Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4088971Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.4089344Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.4089848Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4090361Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4090873Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4091362Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.4091827Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.4092321Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4092746Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.4093151Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.4093546Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.4094216Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.4094663Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4095168Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4095788Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.4096346Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.4096724Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.4097286Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.4097789Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.4098331Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.4098939Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.4099457Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.4099859Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.4100273Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.4100812Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.4101264Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4101734Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.4102228Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.4102690Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.4103140Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4103597Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.4103998Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.4104403Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.4105073Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.4105521Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4105996Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.4106381Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.4106808Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.4107191Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.4107608Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.4108410Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.4108900Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.4109346Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.4109846Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4110335Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.4110816Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.4111293Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.4111698Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.4112182Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.4112570Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.4113059Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.4113514Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.4114029Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.4114515Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.4114980Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.4115343Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4117352Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4117820Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4118706Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4119241Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4120050Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4120665Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4121417Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4122073Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4122587Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4123562Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4123874Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4124636Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4124757Z ('RERUN', {'yellow': True}) [0.3420s] [100%]
2025-12-04T10:35:20.4125963Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.4126972Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4127345Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.4127767Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.4128167Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4128612Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4129077Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4129566Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4130071Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4130543Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4130917Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.4131284Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.4131793Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4132339Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4132895Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.4133384Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.4133841Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.4134290Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4134751Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.4135162Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.4135558Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.4136273Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.4136720Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4137229Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4137841Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.4138361Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.4138701Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.4139314Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.4139815Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.4140364Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.4140968Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.4141371Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.4141777Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.4142181Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.4142713Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.4143165Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4143670Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.4144221Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.4144678Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.4145120Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4145535Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.4145979Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.4146384Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.4147046Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.4147493Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4147919Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.4148305Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.4148737Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.4149124Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.4149547Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.4150005Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.4150468Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.4150914Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.4151413Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4151909Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.4152389Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.4152809Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.4153208Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.4153693Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.4154073Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.4154606Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.4155060Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.4155607Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.4156099Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.4156578Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.4156880Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4158925Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4159403Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4160299Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4160845Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4161604Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4162226Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4162987Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4163655Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4164180Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4165122Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4165432Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4166245Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4166384Z FAILED [0.3404s] [100%]
2025-12-04T10:35:20.4166391Z 
2025-12-04T10:35:20.4166515Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.4166882Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.4167023Z Traceback (most recent call last):
2025-12-04T10:35:20.4167385Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4167606Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4168024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4168241Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4168697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4168907Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4169348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4169476Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4169939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4170227Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4170687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4170820Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4171229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4171333Z     return self._compile_to_module()
2025-12-04T10:35:20.4171754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4171892Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4172331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4172445Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4172914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4173127Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4173629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4173736Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4174192Z   File "/tmp/tmphkws7to1/bu/cbud6absdf4pp2bsbjheogcffjal6tennyacnnfpntpy72bgetgq.py", line 65, in <module>
2025-12-04T10:35:20.4174596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4174698Z     kernel.precompile(
2025-12-04T10:35:20.4175178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4175280Z     self._precompile_worker()
2025-12-04T10:35:20.4175806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4175958Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4176473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4176649Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4177077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4177293Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4177712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4178002Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4178212Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4178776Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4178856Z ^
2025-12-04T10:35:20.4179339Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4179344Z 
2025-12-04T10:35:20.4180012Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4180017Z 
2025-12-04T10:35:20.4180031Z 
2025-12-04T10:35:20.4180226Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4180985Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4180993Z 
2025-12-04T10:35:20.4181229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4181410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4181496Z frames [('total', 1)]
2025-12-04T10:35:20.4181601Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4182005Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4182212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4182293Z graph_break []
2025-12-04T10:35:20.4182639Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.4182749Z Traceback (most recent call last):
2025-12-04T10:35:20.4183107Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4183354Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4183778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4183994Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4184436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4184600Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4185032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4185162Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4185635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4185946Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4186384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4186501Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4186911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4187085Z     return self._compile_to_module()
2025-12-04T10:35:20.4187504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4187648Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4188129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4188246Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4188673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4188865Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4189368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4189477Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4189958Z   File "/tmp/tmpv892o071/ck/cckds2hb4vv22vzh6yfutpmfmk47yxrogjxbfgt63xscwtwn6k52.py", line 65, in <module>
2025-12-04T10:35:20.4190353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4190445Z     kernel.precompile(
2025-12-04T10:35:20.4190922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4191020Z     self._precompile_worker()
2025-12-04T10:35:20.4191533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4191682Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4192185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4192354Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4192742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4192944Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4193329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4193610Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4193856Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4194411Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4194481Z ^
2025-12-04T10:35:20.4194873Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4194880Z 
2025-12-04T10:35:20.4195491Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4195496Z 
2025-12-04T10:35:20.4195502Z 
2025-12-04T10:35:20.4195711Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4196491Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4196499Z 
2025-12-04T10:35:20.4196728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4196906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4196991Z frames [('total', 1)]
2025-12-04T10:35:20.4197097Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4197497Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4197726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4197807Z graph_break []
2025-12-04T10:35:20.4198021Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4198107Z frames [('total', 1)]
2025-12-04T10:35:20.4198205Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4198386Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4198786Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4198865Z graph_break []
2025-12-04T10:35:20.4198987Z =================================== FAILURES ===================================
2025-12-04T10:35:20.4199349Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.4199450Z Traceback (most recent call last):
2025-12-04T10:35:20.4199857Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4200051Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4200469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4200679Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4201115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4201274Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4201714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4201840Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4202302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4202573Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4203019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4203147Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4203594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4203696Z     return self._compile_to_module()
2025-12-04T10:35:20.4204106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4204242Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4204685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4204794Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4205215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4205413Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4205912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4206020Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4206451Z   File "/tmp/tmpqnqivahn/qs/cqs7xfaba5od4xvdspjvom257aczm2wrow2ufllaxquibskunm4a.py", line 65, in <module>
2025-12-04T10:35:20.4206840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4206938Z     kernel.precompile(
2025-12-04T10:35:20.4207407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4207547Z     self._precompile_worker()
2025-12-04T10:35:20.4208265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4208484Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4208993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4209159Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4209537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4209743Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4210113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4210474Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4210669Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4211225Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4211303Z ^
2025-12-04T10:35:20.4211694Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4211699Z 
2025-12-04T10:35:20.4212318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4212323Z 
2025-12-04T10:35:20.4212327Z 
2025-12-04T10:35:20.4212509Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4213288Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4213293Z 
2025-12-04T10:35:20.4213529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4213708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4213798Z frames [('total', 1)]
2025-12-04T10:35:20.4213896Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4214352Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4214556Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4214639Z graph_break []
2025-12-04T10:35:20.4214835Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4214925Z frames [('total', 1)]
2025-12-04T10:35:20.4215027Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4215214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4215609Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4215687Z graph_break []
2025-12-04T10:35:20.4215874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4215957Z frames [('total', 1)]
2025-12-04T10:35:20.4216056Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4216241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4216637Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4216720Z graph_break []
2025-12-04T10:35:20.4217275Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml -
2025-12-04T10:35:20.4217487Z =========================== short test summary info ============================
2025-12-04T10:35:20.4218222Z FAILED [0.3404s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4218815Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.4218896Z ^
2025-12-04T10:35:20.4219330Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4219336Z 
2025-12-04T10:35:20.4219942Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4219955Z 
2025-12-04T10:35:20.4219959Z 
2025-12-04T10:35:20.4220183Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4220945Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4220952Z 
2025-12-04T10:35:20.4221188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4221349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.4221524Z ================== 1 failed, 187 deselected, 2 rerun in 2.52s ==================
2025-12-04T10:35:20.4221606Z Got exit code 1
2025-12-04T10:35:20.4222158Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda
2025-12-04T10:35:20.4222511Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.4222920Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml
2025-12-04T10:35:20.4223058Z ============================= test session starts ==============================
2025-12-04T10:35:20.4223357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.4223449Z cachedir: .pytest_cache
2025-12-04T10:35:20.4223967Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.4224070Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.4224160Z configfile: pytest.ini
2025-12-04T10:35:20.4224631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.4224826Z collecting ... collected 188 items / 32 deselected / 156 selected
2025-12-04T10:35:20.4224964Z stepcurrent: skipping 32 already run items.
2025-12-04T10:35:20.4225060Z Running 156 items in this shard
2025-12-04T10:35:20.4225066Z 
2025-12-04T10:35:20.4226297Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4227244Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4227608Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4227998Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4228480Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4228911Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4233795Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4234288Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4234794Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4235294Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4235840Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4236215Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4236661Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4237070Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4237461Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4237844Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4238391Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4238842Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4239320Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4239795Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4240297Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4240749Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4241244Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4241704Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4242185Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4242647Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4243081Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4243495Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4243908Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4244356Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4244903Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4245360Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4245906Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4246304Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4246675Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4247144Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4247515Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4247935Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4248383Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4248783Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4249217Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4249712Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4250216Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4250761Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4251217Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4251610Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4252098Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4252477Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4252966Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4253418Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4253866Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4254462Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4255072Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4255430Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4257290Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4257797Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4258737Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4259361Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4260134Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4260732Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4261482Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4262155Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4262678Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4263669Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4263981Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4264747Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4264876Z ('RERUN', {'yellow': True}) [1.7478s] [  0%]
2025-12-04T10:35:20.4266103Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4267051Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4267415Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4267807Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4268317Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4268752Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4269224Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4269688Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4270209Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4270707Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4271256Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4271643Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4272092Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4272512Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4272900Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4273280Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4273843Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4274296Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4274776Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4275245Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4275785Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4276259Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4276757Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4277220Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4277701Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4278172Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4278607Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4279023Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4279437Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4279897Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4280458Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4280914Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4281401Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4281813Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4282186Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4282661Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4283030Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4283445Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4283909Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4284320Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4284770Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4285272Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4285791Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4286371Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4286918Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4287307Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4287791Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4288169Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4288654Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4289107Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4289549Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4290153Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4290763Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4291108Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4292904Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4293402Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4294333Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4294880Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4295635Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4296226Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4297081Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4297759Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4298278Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4299318Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4299632Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4300397Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4300517Z ('RERUN', {'yellow': True}) [0.3110s] [  0%]
2025-12-04T10:35:20.4301731Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4302675Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4303033Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4303409Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4303890Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4304276Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4304772Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4305234Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4305781Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4306272Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4306784Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4307163Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4307605Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4308404Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4308790Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4309161Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4309710Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4310154Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4310622Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4311215Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4311708Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4312171Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4312668Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4313130Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4313612Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4314072Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4314500Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4314911Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4315318Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4315782Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4316285Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4316831Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4317317Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4317724Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4318088Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4318562Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4318929Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4319334Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4319794Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4320195Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4320629Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4321123Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4321611Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4322156Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4322605Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4322992Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4323474Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4323844Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4324336Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4324786Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4325222Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4325824Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4326431Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4326780Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4328567Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4329073Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4330003Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4330551Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4331314Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4331904Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4332654Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4333325Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4333849Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4334824Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4335140Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4335957Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4336056Z FAILED [0.3111s] [  0%]
2025-12-04T10:35:20.4336061Z 
2025-12-04T10:35:20.4336181Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.4336523Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4336628Z Traceback (most recent call last):
2025-12-04T10:35:20.4336993Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4337206Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4337620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4337835Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4338284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4338495Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4338945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4339167Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4339623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4339914Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4340358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4340485Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4340890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4340993Z     return self._compile_to_module()
2025-12-04T10:35:20.4341453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4341593Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4342035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4342152Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4342573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4342775Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4343269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4343373Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4343811Z   File "/tmp/tmpb0rkkcyh/rv/crv5h2l66ynzs6ygycxuobosay3yagqb4q7fes2zv3g3gw3phsof.py", line 74, in <module>
2025-12-04T10:35:20.4344202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4344301Z     kernel.precompile(
2025-12-04T10:35:20.4344772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4344872Z     self._precompile_worker()
2025-12-04T10:35:20.4345434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4345584Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4346090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4346267Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4346652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4346864Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4347239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4347525Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4347734Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4348289Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4348372Z ^
2025-12-04T10:35:20.4348768Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4348815Z 
2025-12-04T10:35:20.4349425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4349430Z 
2025-12-04T10:35:20.4349481Z 
2025-12-04T10:35:20.4349666Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4350402Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4350409Z 
2025-12-04T10:35:20.4350642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4350825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4350910Z frames [('total', 1)]
2025-12-04T10:35:20.4351016Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4351457Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4351658Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4351740Z graph_break []
2025-12-04T10:35:20.4352074Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4352185Z Traceback (most recent call last):
2025-12-04T10:35:20.4352545Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4352743Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4353162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4353375Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4353815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4353980Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4354415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4354545Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4354999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4355323Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4355772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4355893Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4356312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4356411Z     return self._compile_to_module()
2025-12-04T10:35:20.4356821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4356964Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4357403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4357514Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4357931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4358129Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4358635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4358742Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4359266Z   File "/tmp/tmp1lbqdv8m/dz/cdz4uz74f7wzgmyudsgimgwnztre32ctqknvswv2d6xloqesd2bh.py", line 74, in <module>
2025-12-04T10:35:20.4359658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4359789Z     kernel.precompile(
2025-12-04T10:35:20.4360274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4360372Z     self._precompile_worker()
2025-12-04T10:35:20.4360885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4361044Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4361552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4361727Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4362160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4362369Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4362758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4363045Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4363252Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4363809Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4363882Z ^
2025-12-04T10:35:20.4364282Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4364289Z 
2025-12-04T10:35:20.4364902Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4364906Z 
2025-12-04T10:35:20.4364912Z 
2025-12-04T10:35:20.4365103Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4365884Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4365933Z 
2025-12-04T10:35:20.4366170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4366350Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4366436Z frames [('total', 1)]
2025-12-04T10:35:20.4366538Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4366946Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4367134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4367223Z graph_break []
2025-12-04T10:35:20.4367406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4367503Z frames [('total', 1)]
2025-12-04T10:35:20.4367603Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4367787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4368194Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4368276Z graph_break []
2025-12-04T10:35:20.4368399Z =================================== FAILURES ===================================
2025-12-04T10:35:20.4368734Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4368836Z Traceback (most recent call last):
2025-12-04T10:35:20.4369241Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4369446Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4369900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4370117Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4370557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4370718Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4371161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4371286Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4371786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4372064Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4372513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4372645Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4373055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4373164Z     return self._compile_to_module()
2025-12-04T10:35:20.4373576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4373719Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4374177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4374290Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4374723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4374928Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4375433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4375618Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4376077Z   File "/tmp/tmpnrq81prz/26/c26wb3xem57peeajq4chhxkigcxnyz6uo2d2zp6fmb6yl4ynck5x.py", line 74, in <module>
2025-12-04T10:35:20.4376480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4376577Z     kernel.precompile(
2025-12-04T10:35:20.4377052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4377164Z     self._precompile_worker()
2025-12-04T10:35:20.4377675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4377827Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4378354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4378525Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4378906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4379166Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4379544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4379878Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4380070Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4380668Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4380746Z ^
2025-12-04T10:35:20.4381139Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4381144Z 
2025-12-04T10:35:20.4381755Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4381759Z 
2025-12-04T10:35:20.4381763Z 
2025-12-04T10:35:20.4381945Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4382723Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4382735Z 
2025-12-04T10:35:20.4382962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4383141Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4383228Z frames [('total', 1)]
2025-12-04T10:35:20.4383323Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4383724Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4383922Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4383999Z graph_break []
2025-12-04T10:35:20.4384185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4384267Z frames [('total', 1)]
2025-12-04T10:35:20.4384362Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4384552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4384945Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4385026Z graph_break []
2025-12-04T10:35:20.4385213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4385297Z frames [('total', 1)]
2025-12-04T10:35:20.4385393Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4385628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4386027Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4386110Z graph_break []
2025-12-04T10:35:20.4386671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml -
2025-12-04T10:35:20.4386817Z =========================== short test summary info ============================
2025-12-04T10:35:20.4387528Z FAILED [0.3111s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4388078Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4388148Z ^
2025-12-04T10:35:20.4388537Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4388542Z 
2025-12-04T10:35:20.4389147Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4389156Z 
2025-12-04T10:35:20.4389202Z 
2025-12-04T10:35:20.4389384Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4390112Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4390159Z 
2025-12-04T10:35:20.4390386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4390533Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.4390708Z ================== 1 failed, 32 deselected, 2 rerun in 2.40s ===================
2025-12-04T10:35:20.4390787Z Got exit code 1
2025-12-04T10:35:20.4390873Z Retrying single test...
2025-12-04T10:35:20.4391276Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml
2025-12-04T10:35:20.4391408Z ============================= test session starts ==============================
2025-12-04T10:35:20.4391742Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.4391836Z cachedir: .pytest_cache
2025-12-04T10:35:20.4392286Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.4392393Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.4392481Z configfile: pytest.ini
2025-12-04T10:35:20.4392939Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.4393126Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.4393784Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4393880Z Running 1 items in this shard
2025-12-04T10:35:20.4393894Z 
2025-12-04T10:35:20.4395115Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4396175Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4396541Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4396908Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4397350Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4397736Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4398188Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4398653Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4399146Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4399643Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4400113Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4400531Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4400967Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4401403Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4401875Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4402246Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4402789Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4403275Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4403735Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4404165Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4404655Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4405110Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4405601Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4406101Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4406583Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4407036Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4407509Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4408143Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4408549Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4408966Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4409464Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4409923Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4410408Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4410814Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4411176Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4411590Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4412037Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4412438Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4412942Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4413343Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4413766Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4414260Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4414796Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4415336Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4415792Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4416166Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4416653Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4417023Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4417511Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4417963Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4418395Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4419087Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4419684Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4419984Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4421769Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4422231Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4423114Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4423692Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4424442Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4425056Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4425814Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4426507Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4427032Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4427963Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4428271Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4429027Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4429134Z ('RERUN', {'yellow': True}) [1.7760s] [100%]
2025-12-04T10:35:20.4430358Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4431321Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4431684Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4432052Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4432495Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4432881Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4433333Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4433792Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4434280Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4434776Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4435243Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4435677Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4436141Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4436575Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4436965Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4437337Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4437878Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4438404Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4438872Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4439307Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4439801Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4440252Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4440738Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4441195Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4441673Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4442125Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4442599Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4443005Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4443401Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4443810Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4444305Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4444758Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4445244Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4445648Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4446060Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4446469Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4446882Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4447283Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4447766Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4448173Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4448596Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4449091Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4449614Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4450151Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4450552Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4450923Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4451411Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4451774Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4452255Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4452706Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4453136Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4453774Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4454366Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4454668Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4456453Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4456907Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4457790Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4458366Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4459175Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4459790Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4460537Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4461190Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4461839Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4462766Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4463069Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4463828Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4463936Z ('RERUN', {'yellow': True}) [0.3100s] [100%]
2025-12-04T10:35:20.4465159Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4466125Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4466487Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4466850Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4467284Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4467671Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4468119Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4468576Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4469067Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4469558Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4470024Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4470434Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4470872Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4471307Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4471696Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4472069Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4472611Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4473104Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4473564Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4473996Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4474483Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4474932Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4475423Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4475876Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4476357Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4476810Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4477284Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4477690Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4478089Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4478498Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4478994Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4479455Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4479937Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4480341Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4480709Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4481118Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4481536Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4481936Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4482443Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4482850Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4483273Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4483768Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4484294Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4484832Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4485246Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4485635Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4486150Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4486515Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4486997Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4487450Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4487882Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4488518Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4489114Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4489418Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4491205Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4491668Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4492550Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4493125Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4493886Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4494500Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4495252Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4495902Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4496459Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4497390Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4497703Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4498458Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4498542Z FAILED [0.3099s] [100%]
2025-12-04T10:35:20.4498546Z 
2025-12-04T10:35:20.4498670Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.4498995Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4499168Z Traceback (most recent call last):
2025-12-04T10:35:20.4499534Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4499729Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4500184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4500393Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4500827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4500987Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4501422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4501544Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4501994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4502265Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4502712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4502831Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4503239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4503337Z     return self._compile_to_module()
2025-12-04T10:35:20.4503744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4504517Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4504958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4505104Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4505549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4505772Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4506273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4506375Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4506809Z   File "/tmp/tmpur3ppmli/7m/c7m4dwiqluuqqmgfxwny7dlzxpghk3ymbq5zmaxutbg7xqmtwnwg.py", line 74, in <module>
2025-12-04T10:35:20.4507247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4507342Z     kernel.precompile(
2025-12-04T10:35:20.4508008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4508109Z     self._precompile_worker()
2025-12-04T10:35:20.4508611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4508769Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4509275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4509439Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4509818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4510025Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4510398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4510680Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4510869Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4511496Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4511566Z ^
2025-12-04T10:35:20.4511959Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4511964Z 
2025-12-04T10:35:20.4512566Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4512576Z 
2025-12-04T10:35:20.4512580Z 
2025-12-04T10:35:20.4512760Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4513505Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4513512Z 
2025-12-04T10:35:20.4513744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4513931Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4514015Z frames [('total', 1)]
2025-12-04T10:35:20.4514109Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4514515Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4514698Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4514842Z graph_break []
2025-12-04T10:35:20.4515167Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4515266Z Traceback (most recent call last):
2025-12-04T10:35:20.4515730Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4515924Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4516336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4516552Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4516989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4517152Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4517634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4517755Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4518209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4518479Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4518921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4519042Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4519446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4519547Z     return self._compile_to_module()
2025-12-04T10:35:20.4519954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4520094Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4520533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4520639Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4521062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4521302Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4521798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4521901Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4522336Z   File "/tmp/tmpvmp1fq0e/7k/c7kiahdzmh42zojv5a6ezlmsnxgpj7trxyfdp4bpkzqeng536ymo.py", line 74, in <module>
2025-12-04T10:35:20.4522736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4522828Z     kernel.precompile(
2025-12-04T10:35:20.4523299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4523403Z     self._precompile_worker()
2025-12-04T10:35:20.4523905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4524052Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4524561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4524727Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4525110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4525382Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4525796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4526128Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4526319Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4526877Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4526944Z ^
2025-12-04T10:35:20.4527332Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4527337Z 
2025-12-04T10:35:20.4527942Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4527991Z 
2025-12-04T10:35:20.4527996Z 
2025-12-04T10:35:20.4528174Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4528914Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4528921Z 
2025-12-04T10:35:20.4529144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4529333Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4529415Z frames [('total', 1)]
2025-12-04T10:35:20.4529508Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4529912Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4530095Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4530172Z graph_break []
2025-12-04T10:35:20.4530355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4530434Z frames [('total', 1)]
2025-12-04T10:35:20.4530528Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4530710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4531103Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4531184Z graph_break []
2025-12-04T10:35:20.4531346Z =================================== FAILURES ===================================
2025-12-04T10:35:20.4531668Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4531772Z Traceback (most recent call last):
2025-12-04T10:35:20.4532128Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4532329Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4532741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4532948Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4533384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4533542Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4533975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4534095Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4534549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4534821Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4535311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4535468Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4535879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4535974Z     return self._compile_to_module()
2025-12-04T10:35:20.4536389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4536528Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4536964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4537068Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4537532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4537730Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4542302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4542433Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4542871Z   File "/tmp/tmpx7f8yy_3/hr/chr7fxwmlbid4fzq5dnbajg3fajamjdl26soiv2pwhkegvsitn6q.py", line 74, in <module>
2025-12-04T10:35:20.4543274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4543369Z     kernel.precompile(
2025-12-04T10:35:20.4543854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4543958Z     self._precompile_worker()
2025-12-04T10:35:20.4544475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4544636Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4545143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4545316Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4545766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4545975Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4546360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4546647Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4546858Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4547414Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4547490Z ^
2025-12-04T10:35:20.4547897Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4547903Z 
2025-12-04T10:35:20.4548514Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4548519Z 
2025-12-04T10:35:20.4548523Z 
2025-12-04T10:35:20.4548712Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4549447Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4549498Z 
2025-12-04T10:35:20.4549732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4549916Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4550043Z frames [('total', 1)]
2025-12-04T10:35:20.4550148Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4550547Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4550736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4550829Z graph_break []
2025-12-04T10:35:20.4551006Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4551093Z frames [('total', 1)]
2025-12-04T10:35:20.4551197Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4551383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4551829Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4551914Z graph_break []
2025-12-04T10:35:20.4552092Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4552182Z frames [('total', 1)]
2025-12-04T10:35:20.4552281Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4552468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4552872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4552950Z graph_break []
2025-12-04T10:35:20.4553513Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml -
2025-12-04T10:35:20.4553654Z =========================== short test summary info ============================
2025-12-04T10:35:20.4554372Z FAILED [0.3099s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4554935Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4555009Z ^
2025-12-04T10:35:20.4555402Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4555454Z 
2025-12-04T10:35:20.4556105Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4556110Z 
2025-12-04T10:35:20.4556114Z 
2025-12-04T10:35:20.4556299Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4557029Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4557037Z 
2025-12-04T10:35:20.4557263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4557423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.4557589Z ================== 1 failed, 187 deselected, 2 rerun in 2.43s ==================
2025-12-04T10:35:20.4557670Z Got exit code 1
2025-12-04T10:35:20.4557772Z Retrying single test...
2025-12-04T10:35:20.4558175Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml
2025-12-04T10:35:20.4558319Z ============================= test session starts ==============================
2025-12-04T10:35:20.4558615Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.4558751Z cachedir: .pytest_cache
2025-12-04T10:35:20.4559205Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.4559310Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.4559441Z configfile: pytest.ini
2025-12-04T10:35:20.4559912Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.4560096Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.4560772Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4560868Z Running 1 items in this shard
2025-12-04T10:35:20.4560872Z 
2025-12-04T10:35:20.4562139Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4563075Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4563439Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4563809Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4564240Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4564639Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4565093Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4565550Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4566105Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4566666Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4567144Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4567509Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4567957Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4568360Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4568745Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4569124Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4569671Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4570131Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4570639Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4571066Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4571608Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4572060Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4572552Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4573001Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4573520Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4573974Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4574405Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4574825Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4575230Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4575637Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4576192Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4576648Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4577144Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4577588Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4577955Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4578375Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4578740Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4579212Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4579661Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4580065Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4580499Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4580999Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4581581Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4582220Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4582636Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4583059Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4583546Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4583929Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4584412Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4584910Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4585343Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4585938Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4586549Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4586852Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4588648Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4589149Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4590055Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4590597Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4591377Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4591957Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4592708Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4593372Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4593898Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4594881Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4595328Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4596104Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4596218Z ('RERUN', {'yellow': True}) [1.7605s] [100%]
2025-12-04T10:35:20.4597486Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4598428Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4598795Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4599182Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4599616Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4600021Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4600478Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4600933Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4601428Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4601961Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4602442Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4602808Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4603249Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4603650Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4604035Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4604419Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4604964Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4605407Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4605968Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4606393Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4606925Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4607375Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4608135Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4608588Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4609176Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4609642Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4610071Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4610493Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4610896Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4611298Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4611798Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4612253Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4612742Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4613203Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4613568Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4613982Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4614348Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4614763Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4615213Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4615619Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4616056Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4616552Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4617043Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4617646Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4618048Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4618480Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4618964Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4619383Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4619870Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4620366Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4620804Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4621404Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4622010Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4622316Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4624107Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4624604Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4625505Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4626038Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4626808Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4627386Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4628140Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4628808Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4629324Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4630303Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4630647Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4631413Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4631525Z ('RERUN', {'yellow': True}) [0.3086s] [100%]
2025-12-04T10:35:20.4632794Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0
2025-12-04T10:35:20.4633728Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4634088Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4634467Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 15
2025-12-04T10:35:20.4634903Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 16
2025-12-04T10:35:20.4635306Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4635790Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4636270Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4636770Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4637305Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4637783Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4638153Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4638591Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4639001Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4639387Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4639765Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_0 = r0_index
2025-12-04T10:35:20.4640313Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4640757Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp30 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.4641268Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp31 = tl.broadcast_to(tmp30, [1, 1])
2025-12-04T10:35:20.4641692Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4642308Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4642760Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.where(r0_mask, tmp2, 0)
2025-12-04T10:35:20.4643250Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4643710Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tl.where(r0_mask, tmp5, 0)
2025-12-04T10:35:20.4644227Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4644681Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = tl.full([1, 1], 15, tl.int32)
2025-12-04T10:35:20.4645114Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.4645533Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = (tmp8 / tmp10)
2025-12-04T10:35:20.4645976Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tmp2 - tmp11
2025-12-04T10:35:20.4646390Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = tmp12 * tmp12
2025-12-04T10:35:20.4646892Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4647346Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.where(r0_mask, tmp14, 0)
2025-12-04T10:35:20.4647843Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4648241Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp1 - tmp11
2025-12-04T10:35:20.4648645Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = 15.0
2025-12-04T10:35:20.4649067Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = (tmp17 / tmp19)
2025-12-04T10:35:20.4649433Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 1e-05
2025-12-04T10:35:20.4649850Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tmp20 + tmp21
2025-12-04T10:35:20.4650296Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = libdevice.rsqrt(tmp22)
2025-12-04T10:35:20.4650698Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp18 * tmp23
2025-12-04T10:35:20.4651135Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp25 = tl_math.abs(tmp24)
2025-12-04T10:35:20.4651626Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4652117Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp28 = tl.where(r0_mask, tmp26, float("-inf"))
2025-12-04T10:35:20.4652700Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4653102Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp32 = tmp24 * tmp31
2025-12-04T10:35:20.4653546Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp33 = -448.0
2025-12-04T10:35:20.4654032Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp34 = triton_helpers.maximum(tmp32, tmp33)
2025-12-04T10:35:20.4654409Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp35 = 448.0
2025-12-04T10:35:20.4654891Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp36 = triton_helpers.minimum(tmp34, tmp35)
2025-12-04T10:35:20.4655384Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp37 = tmp36.to(tl.float8e4nv)
2025-12-04T10:35:20.4655823Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp38 = tmp29.to(tl.float32)
2025-12-04T10:35:20.4656419Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask)
2025-12-04T10:35:20.4657018Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None)
2025-12-04T10:35:20.4657315Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4659156Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4659615Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4660551Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4661085Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4661844Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4662431Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4663181Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4663837Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4664358Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4665330Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4665701Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4666489Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4666578Z FAILED [0.3090s] [100%]
2025-12-04T10:35:20.4666583Z 
2025-12-04T10:35:20.4666704Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.4667038Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4667186Z Traceback (most recent call last):
2025-12-04T10:35:20.4667545Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4667756Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4668170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4668389Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4668827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4668986Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4669423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4669548Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4670016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4670292Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4670737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4670868Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4671322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4671424Z     return self._compile_to_module()
2025-12-04T10:35:20.4671848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4671988Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4672444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4672550Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4672974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4673184Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4673686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4673804Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4674231Z   File "/tmp/tmp2inw7ps3/7n/c7nwg5rj27h3h5u7hcqs2e6kxmi2hnt4w6cfhufcsvbc4eixm7wx.py", line 74, in <module>
2025-12-04T10:35:20.4674621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4674723Z     kernel.precompile(
2025-12-04T10:35:20.4675246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4675347Z     self._precompile_worker()
2025-12-04T10:35:20.4675901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4676106Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4676631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4676795Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4677176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4677393Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4677805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4678110Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4678307Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4678862Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4678944Z ^
2025-12-04T10:35:20.4679354Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4679358Z 
2025-12-04T10:35:20.4679980Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4679984Z 
2025-12-04T10:35:20.4679988Z 
2025-12-04T10:35:20.4680176Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4680920Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4680936Z 
2025-12-04T10:35:20.4681161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4681350Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4681451Z frames [('total', 1)]
2025-12-04T10:35:20.4681599Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4682016Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4682220Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4682304Z graph_break []
2025-12-04T10:35:20.4682635Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4682758Z Traceback (most recent call last):
2025-12-04T10:35:20.4683120Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4683330Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4683743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4683962Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4684412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4684577Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4685020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4685142Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4685668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4685982Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4686473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4686605Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4687013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4687112Z     return self._compile_to_module()
2025-12-04T10:35:20.4687537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4687674Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4688155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4688276Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4688772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4688988Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4689495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4689599Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4690047Z   File "/tmp/tmpjfoamug5/rn/crnti6lnauzipbt65gg7d4qqts3r65qrrpudfnr4ju6pexgwkqoc.py", line 74, in <module>
2025-12-04T10:35:20.4690443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4690542Z     kernel.precompile(
2025-12-04T10:35:20.4691023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4691121Z     self._precompile_worker()
2025-12-04T10:35:20.4691649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4691802Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4692352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4692529Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4692913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4693136Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4693526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4693812Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4694011Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4694573Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4694655Z ^
2025-12-04T10:35:20.4695053Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4695057Z 
2025-12-04T10:35:20.4695680Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4695687Z 
2025-12-04T10:35:20.4695692Z 
2025-12-04T10:35:20.4695918Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4696733Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4696776Z 
2025-12-04T10:35:20.4697009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4697188Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4697269Z frames [('total', 1)]
2025-12-04T10:35:20.4697369Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4697766Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4697969Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4698044Z graph_break []
2025-12-04T10:35:20.4698221Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4698312Z frames [('total', 1)]
2025-12-04T10:35:20.4698448Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4698628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4699076Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4699154Z graph_break []
2025-12-04T10:35:20.4699279Z =================================== FAILURES ===================================
2025-12-04T10:35:20.4699603Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _
2025-12-04T10:35:20.4699701Z Traceback (most recent call last):
2025-12-04T10:35:20.4700065Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4700256Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4700667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4700888Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4701321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4701495Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4701928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4702093Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4702560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4702826Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4703277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4703402Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4703804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4703914Z     return self._compile_to_module()
2025-12-04T10:35:20.4704320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4704456Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4704906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4705012Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4705436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4705653Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4706218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4706325Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4706793Z   File "/tmp/tmp4m9g6blk/fs/cfsxzog75va7fmvrop2h6illmb2t262bbgyjddo4lx2jolzeoqvu.py", line 74, in <module>
2025-12-04T10:35:20.4707190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4707281Z     kernel.precompile(
2025-12-04T10:35:20.4707962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4708066Z     self._precompile_worker()
2025-12-04T10:35:20.4708573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4708725Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4709304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4709470Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4709854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4710055Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4710427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4710727Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4710917Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4711481Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4711553Z ^
2025-12-04T10:35:20.4711942Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4711949Z 
2025-12-04T10:35:20.4712567Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4712572Z 
2025-12-04T10:35:20.4712576Z 
2025-12-04T10:35:20.4712812Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4713556Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4713561Z 
2025-12-04T10:35:20.4713784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4713971Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4714068Z frames [('total', 1)]
2025-12-04T10:35:20.4714165Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4714571Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4714756Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4714831Z graph_break []
2025-12-04T10:35:20.4715012Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4715097Z frames [('total', 1)]
2025-12-04T10:35:20.4715187Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4715378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4715771Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4715918Z graph_break []
2025-12-04T10:35:20.4716096Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4716177Z frames [('total', 1)]
2025-12-04T10:35:20.4716269Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4716505Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4716897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.4716977Z graph_break []
2025-12-04T10:35:20.4717540Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml -
2025-12-04T10:35:20.4717687Z =========================== short test summary info ============================
2025-12-04T10:35:20.4718392Z FAILED [0.3090s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4718992Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4719069Z ^
2025-12-04T10:35:20.4719454Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4719459Z 
2025-12-04T10:35:20.4720068Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4720072Z 
2025-12-04T10:35:20.4720076Z 
2025-12-04T10:35:20.4720254Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4720987Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4720995Z 
2025-12-04T10:35:20.4721216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4721360Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.4721543Z ================== 1 failed, 187 deselected, 2 rerun in 2.41s ==================
2025-12-04T10:35:20.4721625Z Got exit code 1
2025-12-04T10:35:20.4722150Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda
2025-12-04T10:35:20.4722555Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.4722955Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml
2025-12-04T10:35:20.4723104Z ============================= test session starts ==============================
2025-12-04T10:35:20.4723407Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.4723504Z cachedir: .pytest_cache
2025-12-04T10:35:20.4723950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.4724053Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.4724148Z configfile: pytest.ini
2025-12-04T10:35:20.4724604Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.4724795Z collecting ... collected 188 items / 33 deselected / 155 selected
2025-12-04T10:35:20.4724914Z stepcurrent: skipping 33 already run items.
2025-12-04T10:35:20.4725011Z Running 155 items in this shard
2025-12-04T10:35:20.4725016Z 
2025-12-04T10:35:20.4726297Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.4727322Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4727727Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4728116Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.4728550Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.4728941Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4729427Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4729881Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4730378Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4730870Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4731343Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4731718Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4732158Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4732566Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4732956Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4733402Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.4733806Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.4734360Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4734946Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4735525Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4736024Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.4736489Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.4736919Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4737311Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.4737718Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.4738125Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.4738527Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.4738919Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.4739404Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.4739795Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.4740234Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.4740773Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4741283Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.4741822Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4742224Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.4742601Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.4743080Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.4743455Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.4743932Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.4744384Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.4744863Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.4745461Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.4746066Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.4746367Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4748403Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4748856Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4749789Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4750364Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4751127Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4751702Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4752488Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4753143Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4753656Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4754647Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4754949Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4755742Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4755864Z ('RERUN', {'yellow': True}) [1.9602s] [  0%]
2025-12-04T10:35:20.4757136Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.4758117Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4758477Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4758847Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.4759288Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.4759687Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4760135Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4760590Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4761084Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4761615Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4762135Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4762516Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4762952Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4763353Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4763732Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4764146Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.4764549Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.4765094Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4765676Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4766298Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4766746Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.4767212Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.4767644Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4768031Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.4768432Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.4768837Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.4769200Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.4769592Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.4770025Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.4770419Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.4770848Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.4771344Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4771828Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.4772363Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4772806Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.4773224Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.4773706Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.4774073Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.4774549Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.4775057Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.4775494Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.4776139Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.4776739Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.4777036Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4779148Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4779649Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4780535Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4781064Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4781821Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4782394Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4783138Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4783790Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4784348Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4785325Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4785700Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4786483Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4786591Z ('RERUN', {'yellow': True}) [0.4948s] [  0%]
2025-12-04T10:35:20.4787840Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.4788819Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4789181Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4789556Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.4789988Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.4790375Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4790820Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4791271Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4791802Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4792290Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4792759Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4793133Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4793567Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4793966Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4794348Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4794719Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.4795118Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.4795676Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4796332Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4796950Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4797400Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.4797860Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.4798283Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4798719Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.4799078Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.4799482Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.4799843Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.4800229Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.4800668Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.4801061Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.4801490Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.4801981Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4802469Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.4803045Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4803456Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.4803831Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.4804313Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.4804680Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.4805157Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.4805609Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.4806042Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.4806631Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.4807273Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.4807570Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4809875Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4810399Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4811289Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4811820Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4812571Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4813146Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4813894Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4814548Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4815128Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4816157Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4816462Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4817218Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4817303Z FAILED [0.4934s] [  0%]
2025-12-04T10:35:20.4817308Z 
2025-12-04T10:35:20.4817424Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.4817758Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.4817864Z Traceback (most recent call last):
2025-12-04T10:35:20.4818219Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4818416Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4818977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4819230Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4819742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4819901Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4820336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4820453Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4820909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4821178Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4821663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4821789Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4822192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4822295Z     return self._compile_to_module()
2025-12-04T10:35:20.4822703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4822837Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4823279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4823381Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4823795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4823993Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4824488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4824594Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4825029Z   File "/tmp/tmpypvs7ij7/xe/cxeifugu7yk62ihad5gfdz54t3j7qrhu3prwjgfdqr7lhebb5lua.py", line 137, in <module>
2025-12-04T10:35:20.4825469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4825578Z     kernel.precompile(
2025-12-04T10:35:20.4826074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4826173Z     self._precompile_worker()
2025-12-04T10:35:20.4826682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4826833Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4827341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4827508Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4827884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4828093Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4828464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4828749Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4828937Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4829538Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4829656Z ^
2025-12-04T10:35:20.4830044Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4830091Z 
2025-12-04T10:35:20.4830703Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4830710Z 
2025-12-04T10:35:20.4830714Z 
2025-12-04T10:35:20.4830892Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4831625Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.4831633Z 
2025-12-04T10:35:20.4831897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4832085Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4832170Z frames [('total', 1)]
2025-12-04T10:35:20.4832264Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4832662Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4832846Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4832925Z graph_break []
2025-12-04T10:35:20.4833255Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.4833361Z Traceback (most recent call last):
2025-12-04T10:35:20.4833713Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4833910Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4834323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4834527Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4834963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4835121Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4835610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4835740Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4836209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4836481Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4836923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4837047Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4837450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4837547Z     return self._compile_to_module()
2025-12-04T10:35:20.4837954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4838091Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4838523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4838632Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4839047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4839287Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4839780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4839922Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4840357Z   File "/tmp/tmp832i0crr/7f/c7fqqsrwyk54jujuqkn57ldog7uskihl3ptocf632lzu2mvbnmtx.py", line 137, in <module>
2025-12-04T10:35:20.4840747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4840835Z     kernel.precompile(
2025-12-04T10:35:20.4841303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4841393Z     self._precompile_worker()
2025-12-04T10:35:20.4841898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4842087Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4842593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4842760Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4843137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4843342Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4843710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4843989Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4848229Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4848864Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4848947Z ^
2025-12-04T10:35:20.4849353Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4849362Z 
2025-12-04T10:35:20.4850038Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4850044Z 
2025-12-04T10:35:20.4850048Z 
2025-12-04T10:35:20.4850243Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4850981Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.4850986Z 
2025-12-04T10:35:20.4851229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4851417Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4851505Z frames [('total', 1)]
2025-12-04T10:35:20.4851624Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4852029Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4852224Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4852308Z graph_break []
2025-12-04T10:35:20.4852491Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4852582Z frames [('total', 1)]
2025-12-04T10:35:20.4852681Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4852866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4853264Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4853392Z graph_break []
2025-12-04T10:35:20.4853519Z =================================== FAILURES ===================================
2025-12-04T10:35:20.4853891Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.4853994Z Traceback (most recent call last):
2025-12-04T10:35:20.4854361Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4854567Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4855013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4855240Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4855748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4855981Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4856414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4856536Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4856998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4857273Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4857731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4857858Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4858262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4858371Z     return self._compile_to_module()
2025-12-04T10:35:20.4858780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4858918Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4859429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4859539Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4860016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4860216Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4860715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4860833Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4861280Z   File "/tmp/tmpkqiiv75c/t3/ct3mptplrgt2uvirncb4gzuqek6pqzinr2tvwlmepfb3pevb7mwa.py", line 137, in <module>
2025-12-04T10:35:20.4861681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4861772Z     kernel.precompile(
2025-12-04T10:35:20.4862247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4862348Z     self._precompile_worker()
2025-12-04T10:35:20.4862854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4863002Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4863511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4863679Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4864142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4864352Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4864763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4865053Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4865246Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4865856Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4865929Z ^
2025-12-04T10:35:20.4866323Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4866330Z 
2025-12-04T10:35:20.4866975Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4866983Z 
2025-12-04T10:35:20.4866987Z 
2025-12-04T10:35:20.4867168Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4867912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.4867917Z 
2025-12-04T10:35:20.4868140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4868320Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4868408Z frames [('total', 1)]
2025-12-04T10:35:20.4868509Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4868911Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4869105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4869187Z graph_break []
2025-12-04T10:35:20.4869369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4869455Z frames [('total', 1)]
2025-12-04T10:35:20.4869549Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4869735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4870173Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4870262Z graph_break []
2025-12-04T10:35:20.4870438Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4870525Z frames [('total', 1)]
2025-12-04T10:35:20.4870627Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4870814Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4871203Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4871295Z graph_break []
2025-12-04T10:35:20.4871850Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml -
2025-12-04T10:35:20.4871998Z =========================== short test summary info ============================
2025-12-04T10:35:20.4872714Z FAILED [0.4934s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4873315Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4873436Z ^
2025-12-04T10:35:20.4873830Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4873834Z 
2025-12-04T10:35:20.4874447Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4874493Z 
2025-12-04T10:35:20.4874496Z 
2025-12-04T10:35:20.4874679Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4875420Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.4875425Z 
2025-12-04T10:35:20.4875655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4875832Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.4876077Z ================== 1 failed, 33 deselected, 2 rerun in 2.98s ===================
2025-12-04T10:35:20.4876158Z Got exit code 1
2025-12-04T10:35:20.4876247Z Retrying single test...
2025-12-04T10:35:20.4876661Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml
2025-12-04T10:35:20.4876798Z ============================= test session starts ==============================
2025-12-04T10:35:20.4877101Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.4877199Z cachedir: .pytest_cache
2025-12-04T10:35:20.4877644Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.4877753Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.4877846Z configfile: pytest.ini
2025-12-04T10:35:20.4878311Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.4878508Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.4879170Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.4879276Z Running 1 items in this shard
2025-12-04T10:35:20.4879281Z 
2025-12-04T10:35:20.4880549Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.4881535Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4881898Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4882272Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.4882714Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.4883100Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4883556Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4884010Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4884547Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4885042Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4885555Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4885960Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4886417Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4886817Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4887240Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4887614Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.4888024Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.4888569Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4889161Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4889859Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4890320Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.4890787Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.4891218Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4891672Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.4892034Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.4892435Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.4892802Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.4893193Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.4893632Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.4894030Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.4894455Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.4894956Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4895441Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.4896032Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4896434Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.4896853Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.4897338Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.4897702Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.4898186Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.4898681Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.4899166Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.4899759Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.4900356Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.4900662Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4902695Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4903228Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4904209Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4904755Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4905535Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4906144Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4906893Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4907551Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4908368Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4909426Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4909745Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4910503Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4910621Z ('RERUN', {'yellow': True}) [1.9511s] [100%]
2025-12-04T10:35:20.4911895Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.4912880Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4913239Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4913610Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.4914048Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.4914439Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4914898Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4915358Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4915961Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4916457Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4916924Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4917302Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4917737Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4918134Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4918523Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4918894Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.4919301Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.4919842Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4920491Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4921117Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4921569Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.4922040Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.4922467Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4922907Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.4923268Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.4923670Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.4924040Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.4924428Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.4924869Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.4925262Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.4925693Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.4926199Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4926685Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.4927276Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4927680Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.4928049Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.4928542Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.4928917Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.4929401Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.4929855Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.4930291Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.4930887Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.4931530Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.4931877Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4933949Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4934415Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4935314Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4935907Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4936665Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4937248Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4938002Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4938698Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4939262Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4940239Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4940551Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4941310Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4941425Z ('RERUN', {'yellow': True}) [0.4958s] [100%]
2025-12-04T10:35:20.4942641Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.4943621Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4944021Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.4944430Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.4944876Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.4945259Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.4945756Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.4946295Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.4946795Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.4947298Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.4947768Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.4948144Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.4948584Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.4948981Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.4949370Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.4949739Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.4950149Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.4950731Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.4951317Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4951901Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.4952347Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.4952818Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.4953244Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.4953645Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.4954004Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.4954400Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.4954818Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.4955201Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.4955691Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.4956087Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.4956515Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.4957018Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.4957544Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.4958086Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.4958492Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.4958862Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.4959345Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.4959713Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.4960199Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.4960648Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.4961078Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.4961713Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.4962311Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.4962618Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.4964648Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.4965108Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.4966047Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4966626Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4967421Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4968005Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4968753Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4969444Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4969968Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.4970946Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4971260Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.4972016Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4972109Z FAILED [0.4905s] [100%]
2025-12-04T10:35:20.4972116Z 
2025-12-04T10:35:20.4972236Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.4972565Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.4972671Z Traceback (most recent call last):
2025-12-04T10:35:20.4973029Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4973273Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4973696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4973905Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4974345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4974517Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4974952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4975084Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4975538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4975842Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4976312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4976433Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4976844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4976942Z     return self._compile_to_module()
2025-12-04T10:35:20.4977403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4977543Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4978019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4978134Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4978558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4978755Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4979330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4979435Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4979925Z   File "/tmp/tmpq0aachsr/ej/cejtcis6knpjxir6ekapo422t5j5vdbcejxujwvscyxqahq5ixnz.py", line 137, in <module>
2025-12-04T10:35:20.4980322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4980415Z     kernel.precompile(
2025-12-04T10:35:20.4980898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4980995Z     self._precompile_worker()
2025-12-04T10:35:20.4981504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4981657Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4982162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4982334Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4982719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4982926Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4983308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4983593Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4983795Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.4984445Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.4984519Z ^
2025-12-04T10:35:20.4984915Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.4984920Z 
2025-12-04T10:35:20.4985528Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.4985533Z 
2025-12-04T10:35:20.4985539Z 
2025-12-04T10:35:20.4985726Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.4986463Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.4986469Z 
2025-12-04T10:35:20.4986702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.4986888Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.4986968Z frames [('total', 1)]
2025-12-04T10:35:20.4987068Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.4987464Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.4987711Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.4987793Z graph_break []
2025-12-04T10:35:20.4988126Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.4988299Z Traceback (most recent call last):
2025-12-04T10:35:20.4988665Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.4988863Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.4989286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.4989496Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.4989932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.4990140Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.4990574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.4990703Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.4991155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.4991428Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.4991875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.4991996Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.4992409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.4992514Z     return self._compile_to_module()
2025-12-04T10:35:20.4992927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.4993076Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.4993528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.4993634Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.4994189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.4994389Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.4994897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.4995000Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.4995445Z   File "/tmp/tmp6kxbri77/3b/c3b3hyz5useo4gsaajnsxvx7zudxdpssk6acfun6imgryb5gwjqd.py", line 137, in <module>
2025-12-04T10:35:20.4995913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.4996002Z     kernel.precompile(
2025-12-04T10:35:20.4996481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.4996577Z     self._precompile_worker()
2025-12-04T10:35:20.4997086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.4997237Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.4997748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.4997915Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.4998426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.4998648Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.4999102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.4999412Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.4999622Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5000281Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5000352Z ^
2025-12-04T10:35:20.5000769Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5000777Z 
2025-12-04T10:35:20.5001463Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5001469Z 
2025-12-04T10:35:20.5001476Z 
2025-12-04T10:35:20.5001660Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5002397Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5002404Z 
2025-12-04T10:35:20.5002625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5002810Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5002890Z frames [('total', 1)]
2025-12-04T10:35:20.5002983Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5003386Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5003571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5003650Z graph_break []
2025-12-04T10:35:20.5003825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5003908Z frames [('total', 1)]
2025-12-04T10:35:20.5004012Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5004192Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5004626Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5004711Z graph_break []
2025-12-04T10:35:20.5004827Z =================================== FAILURES ===================================
2025-12-04T10:35:20.5005159Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.5005258Z Traceback (most recent call last):
2025-12-04T10:35:20.5005627Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5005855Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5006290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5006502Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5006946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5007109Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5007550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5007665Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5008258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5008605Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5009104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5009238Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5009653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5009755Z     return self._compile_to_module()
2025-12-04T10:35:20.5010165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5010296Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5010742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5010905Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5011327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5011522Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5012019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5012129Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5012574Z   File "/tmp/tmpapf2407i/3d/c3dmve2uwznbo7hqpizqbwgh5fqjahpm6cco2r7sntvvta5bkn6i.py", line 137, in <module>
2025-12-04T10:35:20.5012966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5013055Z     kernel.precompile(
2025-12-04T10:35:20.5013527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5013629Z     self._precompile_worker()
2025-12-04T10:35:20.5014137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5014286Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5014790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5015026Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5015416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5015645Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5016043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5016328Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5016525Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5017129Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5017201Z ^
2025-12-04T10:35:20.5017588Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5017593Z 
2025-12-04T10:35:20.5018196Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5018204Z 
2025-12-04T10:35:20.5018208Z 
2025-12-04T10:35:20.5018390Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5019209Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5019215Z 
2025-12-04T10:35:20.5019485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5019661Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5019746Z frames [('total', 1)]
2025-12-04T10:35:20.5019838Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5020233Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5020418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5020496Z graph_break []
2025-12-04T10:35:20.5020680Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5020774Z frames [('total', 1)]
2025-12-04T10:35:20.5020865Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5021087Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5021478Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5021557Z graph_break []
2025-12-04T10:35:20.5021732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5021816Z frames [('total', 1)]
2025-12-04T10:35:20.5021912Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5022093Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5022477Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5022550Z graph_break []
2025-12-04T10:35:20.5023111Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml -
2025-12-04T10:35:20.5023261Z =========================== short test summary info ============================
2025-12-04T10:35:20.5023972Z FAILED [0.4905s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5024612Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5024690Z ^
2025-12-04T10:35:20.5025088Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5025093Z 
2025-12-04T10:35:20.5025748Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5025759Z 
2025-12-04T10:35:20.5025763Z 
2025-12-04T10:35:20.5025944Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5026676Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5026683Z 
2025-12-04T10:35:20.5026904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5027057Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.5027222Z ================== 1 failed, 187 deselected, 2 rerun in 2.97s ==================
2025-12-04T10:35:20.5027309Z Got exit code 1
2025-12-04T10:35:20.5027393Z Retrying single test...
2025-12-04T10:35:20.5027790Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml
2025-12-04T10:35:20.5027927Z ============================= test session starts ==============================
2025-12-04T10:35:20.5028291Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.5028386Z cachedir: .pytest_cache
2025-12-04T10:35:20.5028886Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.5028989Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.5029077Z configfile: pytest.ini
2025-12-04T10:35:20.5029540Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.5029730Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.5030394Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5030486Z Running 1 items in this shard
2025-12-04T10:35:20.5030493Z 
2025-12-04T10:35:20.5031758Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.5032742Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5033108Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.5033478Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.5033925Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.5034310Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5034762Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5035261Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5035794Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5036302Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.5036776Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5037142Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.5037581Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5037975Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.5038364Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.5038734Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.5039140Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.5039728Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.5040311Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5040943Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5041391Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.5041861Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.5042329Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5042725Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.5043093Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.5043491Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.5043854Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.5044246Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.5044680Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.5045083Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.5045506Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.5046046Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5046580Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.5047117Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.5047520Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.5047894Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.5048383Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.5048750Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.5049231Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.5049680Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.5050108Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.5050703Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.5051343Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.5051680Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5053762Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5054220Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5055112Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5055640Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5056395Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5056971Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5057725Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5058418Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5058937Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5059972Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
﻿2025-12-04T10:35:20.5062767Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5063531Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5063641Z ('RERUN', {'yellow': True}) [1.9787s] [100%]
2025-12-04T10:35:20.5064868Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.5065891Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5066329Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.5066705Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.5067143Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.5067527Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5067981Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5068483Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5068980Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5069472Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.5069941Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5070318Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.5070751Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5071157Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.5071541Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.5071917Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.5072322Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.5072930Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.5073513Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5074092Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5074628Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.5075088Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.5075512Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5075950Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.5076316Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.5076721Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.5077124Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.5077514Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.5077957Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.5078349Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.5078773Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.5079264Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5079794Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.5080335Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.5080743Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.5081118Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.5081598Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.5081960Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.5082451Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.5082895Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.5083334Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.5083964Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.5084569Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.5084869Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5086952Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5087462Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5088349Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5088926Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5089683Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5090262Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5091005Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5091697Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5092216Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5093197Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5093503Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5094263Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5094378Z ('RERUN', {'yellow': True}) [0.4932s] [100%]
2025-12-04T10:35:20.5095591Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.5096660Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5097017Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.5097390Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 150
2025-12-04T10:35:20.5097827Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     R0_BLOCK: tl.constexpr = 256
2025-12-04T10:35:20.5098259Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5098707Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5099222Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5099720Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5100209Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.5100716Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_index = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5101088Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_offset = 0
2025-12-04T10:35:20.5101528Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5101925Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     roffset = r0_offset
2025-12-04T10:35:20.5102306Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rindex = r0_index
2025-12-04T10:35:20.5102676Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_2 = r0_index
2025-12-04T10:35:20.5103129Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_1 = r0_index // 15
2025-12-04T10:35:20.5103670Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32)
2025-12-04T10:35:20.5104255Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5104829Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5105279Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.5105736Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp17 = tl.broadcast_to(tmp16, [1, 1])
2025-12-04T10:35:20.5106167Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5106563Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.5106919Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp5 = 15.0
2025-12-04T10:35:20.5107361Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.5107723Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = 1e-05
2025-12-04T10:35:20.5108264Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.5108708Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.5109102Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.5109608Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.5110101Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5110587Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp14 = tl.where(r0_mask, tmp12, float("-inf"))
2025-12-04T10:35:20.5111123Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32)
2025-12-04T10:35:20.5111527Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp18 = tmp10 * tmp17
2025-12-04T10:35:20.5111956Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp19 = -448.0
2025-12-04T10:35:20.5112438Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.maximum(tmp18, tmp19)
2025-12-04T10:35:20.5112802Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp21 = 448.0
2025-12-04T10:35:20.5113287Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = triton_helpers.minimum(tmp20, tmp21)
2025-12-04T10:35:20.5113732Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp22.to(tl.float8e4nv)
2025-12-04T10:35:20.5114175Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp24 = tmp15.to(tl.float32)
2025-12-04T10:35:20.5114845Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask)
2025-12-04T10:35:20.5115444Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None)
2025-12-04T10:35:20.5115794Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5117825Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5118288Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5119231Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5119764Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5120517Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5121098Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5121887Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5122538Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5123051Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5124027Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5124379Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5125137Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5125225Z FAILED [0.4940s] [100%]
2025-12-04T10:35:20.5125230Z 
2025-12-04T10:35:20.5125346Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.5125712Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.5125829Z Traceback (most recent call last):
2025-12-04T10:35:20.5126226Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5126425Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5126838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5127044Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5127479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5127637Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5128071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5128193Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5128643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5128917Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5129357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5129478Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5129883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5130023Z     return self._compile_to_module()
2025-12-04T10:35:20.5130434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5130568Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5131009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5131126Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5131541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5131787Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5132280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5132382Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5132813Z   File "/tmp/tmplm7i5550/3g/c3gpe46xaiv3dm27odfp43z4bvt5nzjdnwmjy6b2wc4c7yncq5ji.py", line 137, in <module>
2025-12-04T10:35:20.5133201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5133294Z     kernel.precompile(
2025-12-04T10:35:20.5133761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5133900Z     self._precompile_worker()
2025-12-04T10:35:20.5134412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5134559Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5135060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5135232Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5135608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5135814Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5136181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5136503Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5136699Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5137302Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5137374Z ^
2025-12-04T10:35:20.5137762Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5137768Z 
2025-12-04T10:35:20.5138375Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5138379Z 
2025-12-04T10:35:20.5138387Z 
2025-12-04T10:35:20.5138563Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5139340Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5139350Z 
2025-12-04T10:35:20.5139575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5139754Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5139833Z frames [('total', 1)]
2025-12-04T10:35:20.5139930Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5140373Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5140562Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5140640Z graph_break []
2025-12-04T10:35:20.5140967Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.5141069Z Traceback (most recent call last):
2025-12-04T10:35:20.5141428Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5141620Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5142083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5142290Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5142727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5142884Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5143311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5143431Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5143880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5144198Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5144641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5144763Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5145177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5145279Z     return self._compile_to_module()
2025-12-04T10:35:20.5145714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5145862Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5146306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5146467Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5146882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5147079Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5147578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5147680Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5152308Z   File "/tmp/tmp7fafc3o6/uf/cufki3gdnwymicpsh4qp3xw2dso54p4p3y5ilv7dyzckczi3dyxc.py", line 137, in <module>
2025-12-04T10:35:20.5152733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5152827Z     kernel.precompile(
2025-12-04T10:35:20.5153314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5153418Z     self._precompile_worker()
2025-12-04T10:35:20.5153933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5154087Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5154593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5154833Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5155222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5155437Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5155861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5156155Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5156358Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5157012Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5157082Z ^
2025-12-04T10:35:20.5157480Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5157485Z 
2025-12-04T10:35:20.5158088Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5158093Z 
2025-12-04T10:35:20.5158097Z 
2025-12-04T10:35:20.5158285Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5159094Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5159101Z 
2025-12-04T10:35:20.5159338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5159524Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5159607Z frames [('total', 1)]
2025-12-04T10:35:20.5159713Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5160113Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5160307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5160386Z graph_break []
2025-12-04T10:35:20.5160572Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5160663Z frames [('total', 1)]
2025-12-04T10:35:20.5160765Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5160996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5161398Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5161477Z graph_break []
2025-12-04T10:35:20.5161597Z =================================== FAILURES ===================================
2025-12-04T10:35:20.5161937Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _
2025-12-04T10:35:20.5162038Z Traceback (most recent call last):
2025-12-04T10:35:20.5162405Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5162597Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5163007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5163222Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5163658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5163832Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5164265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5164386Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5164895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5165165Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5165603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5165736Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5166141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5166291Z     return self._compile_to_module()
2025-12-04T10:35:20.5166705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5166838Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5167291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5167397Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5167822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5168013Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5168552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5168665Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5169104Z   File "/tmp/tmplij4b26i/pe/cpeyjntdaihccxuruy2y24kny7tuxs4v3lxb7wctljlz63lw667t.py", line 137, in <module>
2025-12-04T10:35:20.5169494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5169586Z     kernel.precompile(
2025-12-04T10:35:20.5170061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5170161Z     self._precompile_worker()
2025-12-04T10:35:20.5170664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5170810Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5171446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5171618Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5171998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5172199Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5172570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5172855Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5173045Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5173651Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5173726Z ^
2025-12-04T10:35:20.5174117Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5174125Z 
2025-12-04T10:35:20.5174734Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5174740Z 
2025-12-04T10:35:20.5174744Z 
2025-12-04T10:35:20.5174966Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5175790Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5175800Z 
2025-12-04T10:35:20.5176083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5176268Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5176359Z frames [('total', 1)]
2025-12-04T10:35:20.5176455Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5176923Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5177108Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5177191Z graph_break []
2025-12-04T10:35:20.5177377Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5177463Z frames [('total', 1)]
2025-12-04T10:35:20.5177557Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5177746Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5178138Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5178223Z graph_break []
2025-12-04T10:35:20.5178447Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5178530Z frames [('total', 1)]
2025-12-04T10:35:20.5178624Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5178812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5179283Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.5179366Z graph_break []
2025-12-04T10:35:20.5179923Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml -
2025-12-04T10:35:20.5180070Z =========================== short test summary info ============================
2025-12-04T10:35:20.5180784Z FAILED [0.4940s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5181433Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.5181514Z ^
2025-12-04T10:35:20.5181902Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5181907Z 
2025-12-04T10:35:20.5182520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5182525Z 
2025-12-04T10:35:20.5182529Z 
2025-12-04T10:35:20.5182708Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5183441Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5183452Z 
2025-12-04T10:35:20.5183679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5183830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.5184007Z ================== 1 failed, 187 deselected, 2 rerun in 3.00s ==================
2025-12-04T10:35:20.5184089Z Got exit code 1
2025-12-04T10:35:20.5184612Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda
2025-12-04T10:35:20.5185014Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.5185414Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml
2025-12-04T10:35:20.5185578Z ============================= test session starts ==============================
2025-12-04T10:35:20.5185899Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.5185992Z cachedir: .pytest_cache
2025-12-04T10:35:20.5186444Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.5186599Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.5186690Z configfile: pytest.ini
2025-12-04T10:35:20.5187159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.5187352Z collecting ... collected 188 items / 34 deselected / 154 selected
2025-12-04T10:35:20.5187479Z stepcurrent: skipping 34 already run items.
2025-12-04T10:35:20.5187575Z Running 154 items in this shard
2025-12-04T10:35:20.5187579Z 
2025-12-04T10:35:20.5188741Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5189811Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5190181Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5190569Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5190957Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5191421Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5191923Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5192414Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5192841Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5193311Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5193690Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5194047Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5194555Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5195057Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5195570Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5196105Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5196553Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5197012Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5197424Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5197831Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5198304Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5198990Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5199433Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5199925Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5200571Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5201089Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5201424Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5201978Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5202495Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5203103Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5203702Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5204111Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5204517Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5204912Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5205456Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5205946Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5206410Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5206900Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5207393Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5208046Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5208462Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5208869Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5209263Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5210032Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5210485Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5210899Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5211281Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5211708Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5212148Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5212568Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5213021Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5213441Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5213879Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5214377Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5214926Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5215424Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5215846Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5216233Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5216716Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5217103Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5217590Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5218052Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5218578Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5219169Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5219638Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5219937Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5221872Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5222372Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5223264Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5223835Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5224601Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5225174Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5225918Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5226617Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5227135Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5228076Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5228382Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5229154Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5229272Z ('RERUN', {'yellow': True}) [1.7872s] [  0%]
2025-12-04T10:35:20.5230414Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5231395Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5231758Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5232141Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5232534Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5232997Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5233494Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5233995Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5234420Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5234889Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5235279Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5235687Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5236237Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5236736Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5237247Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5237740Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5238223Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5238676Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5239091Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5239491Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5239892Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5240581Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5241037Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5241531Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5242142Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5242724Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5243064Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5243614Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5244140Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5244757Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5245354Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5245755Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5246210Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5246606Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5247188Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5247633Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5248095Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5248586Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5249031Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5249526Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5249939Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5250346Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5250742Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5251423Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5251875Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5252293Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5252681Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5253110Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5253491Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5253957Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5254410Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5254831Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5255276Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5255831Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5256322Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5256817Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5257243Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5257628Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5258157Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5258549Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5259109Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5259574Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5260101Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5260593Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5261110Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5261415Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5263349Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5263805Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5264709Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5265279Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5266098Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5266671Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5267429Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5268118Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5268639Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5269575Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5269886Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5270697Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5270808Z ('RERUN', {'yellow': True}) [0.3363s] [  0%]
2025-12-04T10:35:20.5271960Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5272879Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5273281Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5273676Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5274074Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5274544Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5275011Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5275507Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5275990Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5276462Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5276852Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5277211Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5277754Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5278257Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5278773Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5279268Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5279762Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5280225Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5280642Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5281050Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5281463Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5282195Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5282664Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5283165Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5283780Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5284299Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5284702Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5285261Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5285789Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5286368Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5286964Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5287372Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5287790Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5288187Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5288739Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5289225Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5289699Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5290203Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5290736Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5291242Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5291659Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5292078Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5292477Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5293161Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5293657Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5294083Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5294492Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5294917Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5295308Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5295773Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5296282Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5296715Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5297162Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5297667Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5298163Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5298663Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5299159Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5299554Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5300040Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5300481Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5300964Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5301431Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5301962Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5302493Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5302961Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5303262Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5305204Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5305714Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5306619Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5307155Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5308122Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5308708Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5309461Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5310118Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5310639Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5311583Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5311902Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5312732Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5312821Z FAILED [0.3360s] [  0%]
2025-12-04T10:35:20.5312825Z 
2025-12-04T10:35:20.5312949Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.5313292Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5313393Z Traceback (most recent call last):
2025-12-04T10:35:20.5313763Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5313958Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5314436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5314651Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5315087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5315252Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5315680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5315799Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5316254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5316580Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5317034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5317152Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5317558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5317658Z     return self._compile_to_module()
2025-12-04T10:35:20.5318064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5318198Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5318637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5318788Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5319211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5319410Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5319911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5320024Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5320466Z   File "/tmp/tmpqn9luur0/ll/cllpwxipiwu4hbnavuptbgwyo4ilte4qmvu6wkoyi6wefkhxumw5.py", line 65, in <module>
2025-12-04T10:35:20.5320866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5320954Z     kernel.precompile(
2025-12-04T10:35:20.5321422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5321525Z     self._precompile_worker()
2025-12-04T10:35:20.5322036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5322187Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5322692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5322898Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5323285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5323490Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5323862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5324150Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5324339Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5324938Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5325008Z ^
2025-12-04T10:35:20.5325398Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5325405Z 
2025-12-04T10:35:20.5326029Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5326034Z 
2025-12-04T10:35:20.5326038Z 
2025-12-04T10:35:20.5326224Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5326972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5327047Z 
2025-12-04T10:35:20.5327269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5327445Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5327531Z frames [('total', 1)]
2025-12-04T10:35:20.5327624Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5328028Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5328213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5328294Z graph_break []
2025-12-04T10:35:20.5328644Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5328745Z Traceback (most recent call last):
2025-12-04T10:35:20.5329144Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5329341Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5329759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5329976Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5330410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5330574Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5331022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5331144Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5331608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5331878Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5332323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5332445Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5332848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5332984Z     return self._compile_to_module()
2025-12-04T10:35:20.5333397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5333531Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5333972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5334083Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5334506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5334754Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5335249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5335353Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5335774Z   File "/tmp/tmpe__hpp0d/7d/c7drbd5txr7yjssejo5fufwfqqyor6aele6uidtiqa6cbylljtad.py", line 65, in <module>
2025-12-04T10:35:20.5336208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5336304Z     kernel.precompile(
2025-12-04T10:35:20.5336773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5336912Z     self._precompile_worker()
2025-12-04T10:35:20.5337419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5337571Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5338073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5338239Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5338613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5338818Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5339242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5339571Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5339761Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5340325Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5340402Z ^
2025-12-04T10:35:20.5340790Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5340797Z 
2025-12-04T10:35:20.5341411Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5341417Z 
2025-12-04T10:35:20.5341421Z 
2025-12-04T10:35:20.5341599Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5342354Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5342371Z 
2025-12-04T10:35:20.5342594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5342776Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5342859Z frames [('total', 1)]
2025-12-04T10:35:20.5342954Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5343398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5343584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5343664Z graph_break []
2025-12-04T10:35:20.5343849Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5343929Z frames [('total', 1)]
2025-12-04T10:35:20.5344022Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5344208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5344602Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5344728Z graph_break []
2025-12-04T10:35:20.5344854Z =================================== FAILURES ===================================
2025-12-04T10:35:20.5345192Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5345296Z Traceback (most recent call last):
2025-12-04T10:35:20.5345660Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5345852Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5346266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5346606Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5347039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5347211Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5347643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5347765Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5348220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5348486Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5348932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5349056Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5349506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5349606Z     return self._compile_to_module()
2025-12-04T10:35:20.5350017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5350157Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5350593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5350703Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5351130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5351319Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5351818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5351922Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5352350Z   File "/tmp/tmp2ix215xe/7q/c7qisuakrtqg5doqq3zk2rlnzbfaw7fv6mukeq7h5g2w52ecquyt.py", line 65, in <module>
2025-12-04T10:35:20.5352748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5352835Z     kernel.precompile(
2025-12-04T10:35:20.5353352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5353445Z     self._precompile_worker()
2025-12-04T10:35:20.5353950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5354100Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5354606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5354768Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5355196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5355397Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5355815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5356107Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5356297Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5356848Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5356958Z ^
2025-12-04T10:35:20.5357352Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5357358Z 
2025-12-04T10:35:20.5357962Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5357967Z 
2025-12-04T10:35:20.5357971Z 
2025-12-04T10:35:20.5358148Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5358896Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5358901Z 
2025-12-04T10:35:20.5359126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5359306Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5359433Z frames [('total', 1)]
2025-12-04T10:35:20.5359528Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5359927Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5360111Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5360192Z graph_break []
2025-12-04T10:35:20.5360365Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5360446Z frames [('total', 1)]
2025-12-04T10:35:20.5360550Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5360733Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5361122Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5361199Z graph_break []
2025-12-04T10:35:20.5361376Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5361471Z frames [('total', 1)]
2025-12-04T10:35:20.5361560Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5361746Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5362136Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5362212Z graph_break []
2025-12-04T10:35:20.5362808Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml -
2025-12-04T10:35:20.5362955Z =========================== short test summary info ============================
2025-12-04T10:35:20.5363670Z FAILED [0.3360s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5364230Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5364299Z ^
2025-12-04T10:35:20.5364751Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5364755Z 
2025-12-04T10:35:20.5365366Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5365372Z 
2025-12-04T10:35:20.5365376Z 
2025-12-04T10:35:20.5365557Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5366354Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5366359Z 
2025-12-04T10:35:20.5366622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5366774Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.5366940Z ================== 1 failed, 34 deselected, 2 rerun in 2.49s ===================
2025-12-04T10:35:20.5367021Z Got exit code 1
2025-12-04T10:35:20.5367106Z Retrying single test...
2025-12-04T10:35:20.5367506Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml
2025-12-04T10:35:20.5367640Z ============================= test session starts ==============================
2025-12-04T10:35:20.5367931Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.5368016Z cachedir: .pytest_cache
2025-12-04T10:35:20.5368464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.5368563Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.5368653Z configfile: pytest.ini
2025-12-04T10:35:20.5369744Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.5369935Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.5370607Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5370709Z Running 1 items in this shard
2025-12-04T10:35:20.5370713Z 
2025-12-04T10:35:20.5371857Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5372791Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5373165Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5373543Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5373975Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5374456Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5374947Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5375476Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5375968Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5376465Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5376871Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5377254Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5377792Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5378324Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5378872Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5379420Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5379867Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5380308Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5380722Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5381168Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5381561Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5382244Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5382695Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5383192Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5383795Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5384312Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5384652Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5385197Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5385753Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5386316Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5386908Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5387352Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5387750Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5388150Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5388686Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5389133Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5389591Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5390131Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5390578Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5391022Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5391435Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5391836Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5392228Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5392951Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5393403Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5393818Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5394201Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5394624Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5395008Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5395427Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5395929Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5396343Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5396826Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5397329Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5397819Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5398313Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5398767Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5399156Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5399644Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5400026Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5400506Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5401003Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5401537Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5402021Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5402485Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5402782Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5404743Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5405200Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5406140Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5406671Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5407424Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5408146Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5408989Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5409646Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5410162Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5411148Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5411453Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5412212Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5412317Z ('RERUN', {'yellow': True}) [1.7702s] [100%]
2025-12-04T10:35:20.5413458Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5414439Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5414802Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5415180Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5415562Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5416074Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5416534Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5417025Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5417441Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5417905Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5418282Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5418646Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5419223Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5419720Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5420272Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5420766Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5421212Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5421656Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5422069Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5422511Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5422901Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5423580Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5424018Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5424555Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5425162Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5425697Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5426059Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5426605Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5427121Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5427729Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5428330Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5428731Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5429133Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5429531Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5430063Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5430522Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5430981Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5431511Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5431957Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5432402Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5432815Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5433218Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5433660Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5434342Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5434787Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5435202Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5435589Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5436055Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5436436Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5436854Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5437304Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5437715Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5438162Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5438702Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5439202Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5439694Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5440111Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5440498Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5440978Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5441370Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5441855Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5442308Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5442874Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5443359Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5443825Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5444127Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5446138Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5446588Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5447517Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5448045Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5448805Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5449378Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5450185Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5450843Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5451357Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5452285Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5452587Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5453356Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5453467Z ('RERUN', {'yellow': True}) [0.3338s] [100%]
2025-12-04T10:35:20.5454605Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5455570Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5455965Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5456370Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5456752Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5457249Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5457702Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5458189Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5458602Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5463088Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5463581Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5463953Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5464465Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5464971Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5465488Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5466031Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5466485Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5466933Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5467358Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5467769Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5468172Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5468858Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5469309Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5469822Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5470475Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5471002Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5471344Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5471905Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5472466Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5473032Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5473636Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5474037Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5474442Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5474885Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5475423Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5475919Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5476382Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5476875Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5477321Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5477818Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5478240Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5478646Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5479050Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5479733Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5480187Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5480615Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5481008Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5481441Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5481869Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5482299Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5482751Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5483176Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5483623Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5484164Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5484659Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5485156Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5485578Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5486015Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5486498Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5486892Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5487381Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5487854Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5488383Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5488916Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5489388Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5489693Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5491632Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5492086Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5492988Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5493564Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5494326Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5494905Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5495690Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5496400Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5496918Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5497855Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5498229Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5498994Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5499139Z FAILED [0.3335s] [100%]
2025-12-04T10:35:20.5499145Z 
2025-12-04T10:35:20.5499269Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.5499616Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5499719Z Traceback (most recent call last):
2025-12-04T10:35:20.5500079Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5500274Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5500740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5500956Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5501389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5501548Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5501990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5502116Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5502571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5502840Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5503284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5503419Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5503824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5503930Z     return self._compile_to_module()
2025-12-04T10:35:20.5504381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5504518Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5504965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5505077Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5505521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5505839Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5506444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5506559Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5506995Z   File "/tmp/tmpmohm657b/ov/cov5vl5cspe2peu4mlvzmwz7kf5eg4iuhvcvury5x3haapw5vloh.py", line 65, in <module>
2025-12-04T10:35:20.5507391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5507486Z     kernel.precompile(
2025-12-04T10:35:20.5508231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5508336Z     self._precompile_worker()
2025-12-04T10:35:20.5508844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5509081Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5509592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5509758Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5510140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5510355Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5510725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5511014Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5511204Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5511827Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5511906Z ^
2025-12-04T10:35:20.5512298Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5512303Z 
2025-12-04T10:35:20.5512917Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5512921Z 
2025-12-04T10:35:20.5512925Z 
2025-12-04T10:35:20.5513108Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5513871Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5513879Z 
2025-12-04T10:35:20.5514107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5514290Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5514383Z frames [('total', 1)]
2025-12-04T10:35:20.5514476Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5514874Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5515132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5515213Z graph_break []
2025-12-04T10:35:20.5515577Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5515690Z Traceback (most recent call last):
2025-12-04T10:35:20.5516067Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5516272Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5516685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5516962Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5517407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5517567Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5518007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5518132Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5518589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5518951Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5519440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5519569Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5519974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5520077Z     return self._compile_to_module()
2025-12-04T10:35:20.5520494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5520630Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5521063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5521176Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5521638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5521838Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5522340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5522442Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5522878Z   File "/tmp/tmpucqg84je/xr/cxrau6ed7meq5ylwsfmuzj5zamphy7wpqql4eho35htnhxjphcyh.py", line 65, in <module>
2025-12-04T10:35:20.5523270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5523363Z     kernel.precompile(
2025-12-04T10:35:20.5523832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5523926Z     self._precompile_worker()
2025-12-04T10:35:20.5524443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5524592Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5525105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5525273Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5525735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5525958Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5526334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5526616Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5526813Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5527368Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5527567Z ^
2025-12-04T10:35:20.5527958Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5527963Z 
2025-12-04T10:35:20.5528571Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5528579Z 
2025-12-04T10:35:20.5528583Z 
2025-12-04T10:35:20.5528767Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5529517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5529564Z 
2025-12-04T10:35:20.5529793Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5529978Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5530071Z frames [('total', 1)]
2025-12-04T10:35:20.5530164Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5530568Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5530762Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5530842Z graph_break []
2025-12-04T10:35:20.5531017Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5531105Z frames [('total', 1)]
2025-12-04T10:35:20.5531198Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5531378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5531818Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5531899Z graph_break []
2025-12-04T10:35:20.5532024Z =================================== FAILURES ===================================
2025-12-04T10:35:20.5532366Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5532473Z Traceback (most recent call last):
2025-12-04T10:35:20.5532849Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5533045Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5533462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5533672Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5534111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5534281Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5534719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5534837Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5535295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5535613Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5536065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5536189Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5536594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5536703Z     return self._compile_to_module()
2025-12-04T10:35:20.5537116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5537327Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5537766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5537871Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5538300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5538493Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5538992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5539158Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5539629Z   File "/tmp/tmpgdt6u5h_/vf/cvferfwynoum5tyqrtzfd3x6onywyyn64wlvg7xkdqifuotltolr.py", line 65, in <module>
2025-12-04T10:35:20.5540035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5540127Z     kernel.precompile(
2025-12-04T10:35:20.5540599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5540699Z     self._precompile_worker()
2025-12-04T10:35:20.5541210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5541365Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5541866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5542033Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5542466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5542674Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5543045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5543336Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5543533Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5544099Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5544171Z ^
2025-12-04T10:35:20.5544567Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5544574Z 
2025-12-04T10:35:20.5545198Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5545204Z 
2025-12-04T10:35:20.5545208Z 
2025-12-04T10:35:20.5545391Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5546235Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5546240Z 
2025-12-04T10:35:20.5546476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5546659Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5546744Z frames [('total', 1)]
2025-12-04T10:35:20.5546838Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5547252Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5547437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5547564Z graph_break []
2025-12-04T10:35:20.5547757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5547844Z frames [('total', 1)]
2025-12-04T10:35:20.5547954Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5548143Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5548540Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5548628Z graph_break []
2025-12-04T10:35:20.5548810Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5548890Z frames [('total', 1)]
2025-12-04T10:35:20.5548996Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5549226Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5549627Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5549708Z graph_break []
2025-12-04T10:35:20.5550265Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml -
2025-12-04T10:35:20.5550417Z =========================== short test summary info ============================
2025-12-04T10:35:20.5551147Z FAILED [0.3335s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5551700Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5551791Z ^
2025-12-04T10:35:20.5552233Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5552240Z 
2025-12-04T10:35:20.5552859Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5552863Z 
2025-12-04T10:35:20.5552867Z 
2025-12-04T10:35:20.5553061Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5553815Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5553820Z 
2025-12-04T10:35:20.5554043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5554199Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.5554385Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ==================
2025-12-04T10:35:20.5554466Z Got exit code 1
2025-12-04T10:35:20.5554565Z Retrying single test...
2025-12-04T10:35:20.5554970Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml
2025-12-04T10:35:20.5555110Z ============================= test session starts ==============================
2025-12-04T10:35:20.5555463Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.5555555Z cachedir: .pytest_cache
2025-12-04T10:35:20.5556005Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.5556115Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.5556204Z configfile: pytest.ini
2025-12-04T10:35:20.5556675Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.5556872Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.5557589Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5557694Z Running 1 items in this shard
2025-12-04T10:35:20.5557700Z 
2025-12-04T10:35:20.5558851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5559790Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5560204Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5560594Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5560982Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5561446Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5561920Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5562422Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5562896Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5563368Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5563751Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5564129Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5564639Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5565147Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5565667Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5566204Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5566667Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5567164Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5567595Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5567996Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5568393Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5569092Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5569585Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5570102Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5570716Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5571236Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5571621Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5572176Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5572703Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5573278Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5573882Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5574335Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5574733Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5575150Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5575718Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5576203Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5576665Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5577168Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5577626Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5578084Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5578577Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5578986Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5579455Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5580150Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5580607Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5581073Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5581460Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5581900Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5582291Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5582710Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5583227Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5583651Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5584104Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5584614Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5585109Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5585666Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5586096Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5586499Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5586986Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5587390Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5587883Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5588336Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5588878Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5589367Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5589842Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5590189Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5592137Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5592637Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5593537Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5594078Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5594879Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5595463Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5596275Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5596948Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5597466Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5598461Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5598779Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5599553Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5599678Z ('RERUN', {'yellow': True}) [1.7849s] [100%]
2025-12-04T10:35:20.5600824Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5601762Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5602131Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5602560Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5602952Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5603403Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5603877Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5604369Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5604844Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5605463Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5606631Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5607483Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5608611Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5609811Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5610929Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5612127Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5613179Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5614186Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5615222Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5616216Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5617123Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5618320Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5619598Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5620645Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5621865Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5623087Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5624043Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5625109Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5626289Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5627486Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5628765Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5629944Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5630870Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5631783Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5632824Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5633920Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5634983Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5636063Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5637109Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5638117Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5639084Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5640011Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5640998Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5642191Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5643433Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5644409Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5645325Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5646293Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5647223Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5648141Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5649128Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5650159Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5651131Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5652184Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5653299Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5654434Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5655469Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5656441Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5657426Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5658406Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5659478Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5660538Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5661629Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5662756Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5663820Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5664694Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5667078Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5669565Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5671012Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5672530Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5673924Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5675410Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5676896Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5678408Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5679723Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5681287Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5682631Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5683799Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5684821Z ('RERUN', {'yellow': True}) [0.3338s] [100%]
2025-12-04T10:35:20.5686221Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.5688387Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5689784Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 10
2025-12-04T10:35:20.5690635Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.5691559Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5692500Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5693521Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5694581Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5695598Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.5696592Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5697546Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5698396Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.5699440Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5700546Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5701709Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.5702819Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5703865Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5704876Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5705917Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5706862Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5707977Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5709164Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5710396Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5711593Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5712905Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.5714222Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.5715241Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.5716311Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.5717645Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.5718846Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.5720121Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.5721228Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.5722145Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.5723053Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.5724100Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5725187Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.5726202Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.5727322Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5728371Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5729373Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5730351Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5731338Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5732249Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.5733436Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5734673Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.5735646Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.5736633Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.5737557Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.5738473Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.5739441Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.5740436Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.5741417Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.5742431Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.5743488Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5744602Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.5745703Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20)
2025-12-04T10:35:20.5746773Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.5747692Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.5748686Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.5749676Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.5750660Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.5751756Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.5752847Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask)
2025-12-04T10:35:20.5753971Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.5755043Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, xmask)
2025-12-04T10:35:20.5756002Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5758337Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5760845Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5762297Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5763826Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5765229Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5766657Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5768142Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5769660Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5770945Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5772497Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5773828Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5775011Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5776018Z FAILED [0.3324s] [100%]
2025-12-04T10:35:20.5776165Z 
2025-12-04T10:35:20.5776293Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.5776908Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5777469Z Traceback (most recent call last):
2025-12-04T10:35:20.5778017Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5778685Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5779447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5780191Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5780947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5781706Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5782402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5783069Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5783749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5784581Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5785414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5786189Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5786829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5787442Z     return self._compile_to_module()
2025-12-04T10:35:20.5788047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5793700Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5794457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5795232Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5795865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5796604Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5797501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5798224Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5798880Z   File "/tmp/tmpyteabnu3/fx/cfxidyigfwltuoh653wimpocbemcp5kcliomzxxd6gsqscd7xypm.py", line 65, in <module>
2025-12-04T10:35:20.5799838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5800441Z     kernel.precompile(
2025-12-04T10:35:20.5801064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5801750Z     self._precompile_worker()
2025-12-04T10:35:20.5802438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5803210Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5803980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5804778Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5805444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5806197Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5806947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5808105Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5808701Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5809555Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5810305Z ^
2025-12-04T10:35:20.5810794Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5811386Z 
2025-12-04T10:35:20.5811997Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5812714Z 
2025-12-04T10:35:20.5812718Z 
2025-12-04T10:35:20.5812905Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5813945Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5814813Z 
2025-12-04T10:35:20.5815040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5815662Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5816065Z frames [('total', 1)]
2025-12-04T10:35:20.5816303Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5816878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5817580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5817961Z graph_break []
2025-12-04T10:35:20.5818430Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5818988Z Traceback (most recent call last):
2025-12-04T10:35:20.5819582Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5820250Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5820972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5821784Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5822540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5823258Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5823967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5824640Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5825320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5826169Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5827004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5827692Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5828328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5828962Z     return self._compile_to_module()
2025-12-04T10:35:20.5829573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5830229Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5831070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5831740Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5832377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5833105Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5833921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5834715Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5835355Z   File "/tmp/tmp44k0439s/5y/c5ye4jycncnvw4gwd7j4aup2rf4bhoqelymqqbtpxdumzremnr5q.py", line 65, in <module>
2025-12-04T10:35:20.5836350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5836951Z     kernel.precompile(
2025-12-04T10:35:20.5837576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5838295Z     self._precompile_worker()
2025-12-04T10:35:20.5839011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5839843Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5840711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5841500Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5842159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5842889Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5843587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5844360Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5844959Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5845850Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5846644Z ^
2025-12-04T10:35:20.5847135Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5847650Z 
2025-12-04T10:35:20.5848254Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5848974Z 
2025-12-04T10:35:20.5848977Z 
2025-12-04T10:35:20.5849167Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5850208Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5851065Z 
2025-12-04T10:35:20.5851287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5851823Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5852209Z frames [('total', 1)]
2025-12-04T10:35:20.5852446Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5853024Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5853730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5854112Z graph_break []
2025-12-04T10:35:20.5854459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5854849Z frames [('total', 1)]
2025-12-04T10:35:20.5855087Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5855439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5856136Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5856729Z graph_break []
2025-12-04T10:35:20.5856978Z =================================== FAILURES ===================================
2025-12-04T10:35:20.5857557Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _
2025-12-04T10:35:20.5858165Z Traceback (most recent call last):
2025-12-04T10:35:20.5858719Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.5859442Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.5860174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.5860914Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.5861677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.5862386Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.5863143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.5863822Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.5864518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.5865359Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.5866281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.5866965Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.5867609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.5868235Z     return self._compile_to_module()
2025-12-04T10:35:20.5868896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.5869568Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.5870251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.5870917Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.5871550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.5872280Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.5873084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.5873808Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.5874457Z   File "/tmp/tmpmx1tjsyz/jy/cjya57nhlljiseo34fttqqnbztnnw6fpb4kfoonooc3w5yuzpswn.py", line 65, in <module>
2025-12-04T10:35:20.5875406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.5876012Z     kernel.precompile(
2025-12-04T10:35:20.5876641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.5877324Z     self._precompile_worker()
2025-12-04T10:35:20.5878002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.5878827Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.5879596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5880386Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5881042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5881754Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5882456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5883273Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5883868Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5884727Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5885475Z ^
2025-12-04T10:35:20.5886006Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5886526Z 
2025-12-04T10:35:20.5887133Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5887908Z 
2025-12-04T10:35:20.5887912Z 
2025-12-04T10:35:20.5888093Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5889140Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5889998Z 
2025-12-04T10:35:20.5890240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5890761Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5891145Z frames [('total', 1)]
2025-12-04T10:35:20.5891386Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5891956Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5892665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5893095Z graph_break []
2025-12-04T10:35:20.5893405Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5893786Z frames [('total', 1)]
2025-12-04T10:35:20.5894030Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5894393Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5895090Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5895696Z graph_break []
2025-12-04T10:35:20.5896047Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.5896427Z frames [('total', 1)]
2025-12-04T10:35:20.5896666Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.5897026Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.5897726Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.5898314Z graph_break []
2025-12-04T10:35:20.5898998Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml -
2025-12-04T10:35:20.5899959Z =========================== short test summary info ============================
2025-12-04T10:35:20.5901009Z FAILED [0.3324s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.5902391Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5903132Z ^
2025-12-04T10:35:20.5903623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5904131Z 
2025-12-04T10:35:20.5904745Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.5905508Z 
2025-12-04T10:35:20.5905512Z 
2025-12-04T10:35:20.5905720Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.5906787Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5907652Z 
2025-12-04T10:35:20.5908051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.5908683Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.5909117Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ==================
2025-12-04T10:35:20.5909482Z Got exit code 1
2025-12-04T10:35:20.5910241Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda
2025-12-04T10:35:20.5911244Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.5912110Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml
2025-12-04T10:35:20.5912765Z ============================= test session starts ==============================
2025-12-04T10:35:20.5913325Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.5913824Z cachedir: .pytest_cache
2025-12-04T10:35:20.5914419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.5915084Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.5915370Z configfile: pytest.ini
2025-12-04T10:35:20.5916042Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.5916807Z collecting ... collected 188 items / 35 deselected / 153 selected
2025-12-04T10:35:20.5917234Z stepcurrent: skipping 35 already run items.
2025-12-04T10:35:20.5917548Z Running 153 items in this shard
2025-12-04T10:35:20.5917724Z 
2025-12-04T10:35:20.5918961Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.5921369Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5922902Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.5923760Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.5924638Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5925717Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5926778Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.5927851Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.5928960Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.5930098Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.5931058Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.5932096Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.5933195Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.5934209Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.5935321Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.5936378Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.5937387Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.5938362Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.5939373Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.5940287Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.5941269Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.5942452Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.5943781Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5945059Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.5946199Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.5947175Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.5948075Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.5948989Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.5949900Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.5950845Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.5951816Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.5952793Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.5953766Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.5954829Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.5955990Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.5957068Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.5958080Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.5959010Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.5960050Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.5961032Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.5962020Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.5963077Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.5964244Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.5965435Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.5966573Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.5967725Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.5968737Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.5971389Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.5974165Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.5975664Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.5977187Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.5978593Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.5980144Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.5981583Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.5983092Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.5984368Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.5986110Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5987635Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.5988824Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.5989809Z ('RERUN', {'yellow': True}) [1.9043s] [  0%]
2025-12-04T10:35:20.5991287Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.5993673Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.5995246Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.5996157Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.5997038Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.5997988Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.5999014Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6000083Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6001179Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6002302Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6003297Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6004329Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6005433Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6006493Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6007563Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6008946Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6009957Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6010922Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6011948Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6012864Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6013800Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6014976Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6016304Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6017669Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6018813Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6019909Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6020809Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6021721Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6022638Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6023538Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6024515Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6025498Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6026469Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6027635Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6028746Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6029828Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6030844Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6031878Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6032874Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6033859Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6034848Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6035911Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6037131Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6038327Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6039369Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6040525Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6041574Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6044280Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6047060Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6048512Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6050046Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6051451Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6052938Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6054378Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6055899Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6057177Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6058920Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6060454Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6061637Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6062663Z ('RERUN', {'yellow': True}) [0.4090s] [  0%]
2025-12-04T10:35:20.6064109Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6066509Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6068046Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6068905Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6069830Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6070790Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6071813Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6072881Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6073981Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6075056Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6076030Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6077068Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6078163Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6079263Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6080339Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6081396Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6082416Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6083429Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6084363Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6085282Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6086223Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6087397Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6088777Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6090062Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6091205Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6092178Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6093081Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6093999Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6094963Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6095873Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6096844Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6097827Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6098804Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6099926Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6100422Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6100902Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6101322Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6101897Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6102396Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6102779Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6103273Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6103730Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6104373Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6104872Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6105304Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6105904Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6106279Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6108894Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6109471Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6110373Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6110914Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6111669Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6112252Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6113004Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6113665Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6114252Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6115433Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6115751Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6116515Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6116665Z FAILED [0.4081s] [  0%]
2025-12-04T10:35:20.6116670Z 
2025-12-04T10:35:20.6116789Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.6117136Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6117239Z Traceback (most recent call last):
2025-12-04T10:35:20.6117599Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6117802Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6118217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6118491Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6118929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6119089Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6119532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6119652Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6120113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6120382Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6120867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6121004Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6121412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6121510Z     return self._compile_to_module()
2025-12-04T10:35:20.6121923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6122062Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6122508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6122616Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6123034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6123237Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6123740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6123849Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6124272Z   File "/tmp/tmpq6lie_52/ff/cffniubmfohrettsmmh2tfk6sstfl6nhgon5b6rvek6i4xyiqnxn.py", line 137, in <module>
2025-12-04T10:35:20.6124664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6124807Z     kernel.precompile(
2025-12-04T10:35:20.6125281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6125378Z     self._precompile_worker()
2025-12-04T10:35:20.6125892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6126043Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6126552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6126761Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6127140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6127353Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6127766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6128056Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6128246Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6128940Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6129064Z ^
2025-12-04T10:35:20.6129457Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6129462Z 
2025-12-04T10:35:20.6130073Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6130080Z 
2025-12-04T10:35:20.6130085Z 
2025-12-04T10:35:20.6136623Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6137400Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6137415Z 
2025-12-04T10:35:20.6137728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6137922Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6138023Z frames [('total', 1)]
2025-12-04T10:35:20.6138123Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6138530Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6138729Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6138818Z graph_break []
2025-12-04T10:35:20.6139222Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6139329Z Traceback (most recent call last):
2025-12-04T10:35:20.6139693Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6139895Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6140318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6140533Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6140979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6141147Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6141642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6141765Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6142219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6142501Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6142955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6143085Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6143536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6143638Z     return self._compile_to_module()
2025-12-04T10:35:20.6144053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6144190Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6144632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6144748Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6145171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6145416Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6145914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6146021Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6146482Z   File "/tmp/tmp3fletsr9/pq/cpqqhfzeruvcohwwqrokjmbvx5nocxo6h7vgenas534kh2hcw5qa.py", line 137, in <module>
2025-12-04T10:35:20.6146878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6146975Z     kernel.precompile(
2025-12-04T10:35:20.6147451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6147548Z     self._precompile_worker()
2025-12-04T10:35:20.6148064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6148260Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6148767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6148941Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6149320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6149533Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6149906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6150190Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6150388Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6151082Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6151167Z ^
2025-12-04T10:35:20.6151557Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6151562Z 
2025-12-04T10:35:20.6152215Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6152227Z 
2025-12-04T10:35:20.6152231Z 
2025-12-04T10:35:20.6152414Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6153157Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6153165Z 
2025-12-04T10:35:20.6153397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6153580Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6153743Z frames [('total', 1)]
2025-12-04T10:35:20.6153842Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6154242Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6154436Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6154517Z graph_break []
2025-12-04T10:35:20.6154697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6154790Z frames [('total', 1)]
2025-12-04T10:35:20.6154889Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6155072Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6155471Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6155594Z graph_break []
2025-12-04T10:35:20.6155727Z =================================== FAILURES ===================================
2025-12-04T10:35:20.6156114Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6156219Z Traceback (most recent call last):
2025-12-04T10:35:20.6156585Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6156783Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6157196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6157416Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6157900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6158216Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6158650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6158773Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6159234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6159508Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6159955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6160078Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6160484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6160598Z     return self._compile_to_module()
2025-12-04T10:35:20.6161008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6161156Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6161594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6161703Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6162179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6162379Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6162877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6162987Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6163428Z   File "/tmp/tmp57nr7q4e/3r/c3rs2r2zgrzp53qlhjntemg26khskch3b6jysnyonxqxg2qfehvj.py", line 137, in <module>
2025-12-04T10:35:20.6163831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6163963Z     kernel.precompile(
2025-12-04T10:35:20.6164437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6164545Z     self._precompile_worker()
2025-12-04T10:35:20.6165054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6165214Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6165743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6165935Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6166370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6166579Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6166954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6167245Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6167439Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6168139Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6168217Z ^
2025-12-04T10:35:20.6168613Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6168620Z 
2025-12-04T10:35:20.6169281Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6169288Z 
2025-12-04T10:35:20.6169292Z 
2025-12-04T10:35:20.6169474Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6170268Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6170275Z 
2025-12-04T10:35:20.6170498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6170687Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6170779Z frames [('total', 1)]
2025-12-04T10:35:20.6170880Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6171291Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6171480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6171563Z graph_break []
2025-12-04T10:35:20.6171749Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6171835Z frames [('total', 1)]
2025-12-04T10:35:20.6171938Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6172170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6172563Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6172652Z graph_break []
2025-12-04T10:35:20.6172832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6172915Z frames [('total', 1)]
2025-12-04T10:35:20.6173018Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6173205Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6173595Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6173729Z graph_break []
2025-12-04T10:35:20.6174325Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml -
2025-12-04T10:35:20.6174480Z =========================== short test summary info ============================
2025-12-04T10:35:20.6175203Z FAILED [0.4081s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6175927Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6176046Z ^
2025-12-04T10:35:20.6176439Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6176446Z 
2025-12-04T10:35:20.6177062Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6177067Z 
2025-12-04T10:35:20.6177071Z 
2025-12-04T10:35:20.6177255Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6178007Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6178012Z 
2025-12-04T10:35:20.6178237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6178387Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.6178609Z ================== 1 failed, 35 deselected, 2 rerun in 2.76s ===================
2025-12-04T10:35:20.6178693Z Got exit code 1
2025-12-04T10:35:20.6178788Z Retrying single test...
2025-12-04T10:35:20.6179247Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml
2025-12-04T10:35:20.6179383Z ============================= test session starts ==============================
2025-12-04T10:35:20.6179686Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.6179775Z cachedir: .pytest_cache
2025-12-04T10:35:20.6180221Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.6180327Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.6180415Z configfile: pytest.ini
2025-12-04T10:35:20.6180882Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.6181072Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.6181746Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6181846Z Running 1 items in this shard
2025-12-04T10:35:20.6181852Z 
2025-12-04T10:35:20.6183130Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6184204Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6184610Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6184996Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6185386Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6185837Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6186296Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6186790Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6187331Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6187804Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6188183Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6188725Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6189169Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6189675Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6190170Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6190621Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6191078Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6191491Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6191907Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6192298Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6192730Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6193374Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6194025Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6194612Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6195060Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6195484Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6195955Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6196370Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6196762Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6197168Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6197624Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6198038Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6198524Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6199035Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6199529Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6200012Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6200436Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6200868Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6201366Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6201754Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6202245Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6202703Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6203308Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6203798Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6204235Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6204845Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6205189Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6207491Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6208328Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6209261Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6209795Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6210662Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6211242Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6212004Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6212658Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6213236Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6214413Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6214723Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6215494Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6215606Z ('RERUN', {'yellow': True}) [1.8966s] [100%]
2025-12-04T10:35:20.6216846Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6217919Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6218342Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6218731Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6219168Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6219632Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6220090Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6220656Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6221153Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6221621Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6222006Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6222546Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6223043Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6223508Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6224004Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6224462Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6224906Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6225371Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6225779Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6226172Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6226602Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6227247Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6227833Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6228418Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6228875Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6229283Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6229709Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6230132Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6230511Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6230927Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6231377Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6231832Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6232287Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6232790Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6233289Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6233767Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6234228Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6234627Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6235116Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6235510Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6235997Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6236520Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6237131Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6237622Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6238064Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6238667Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6238981Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6241267Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6241732Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6242625Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6243209Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6243970Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6244554Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6245306Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6246033Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6246585Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6247652Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6247968Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6248774Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6248888Z ('RERUN', {'yellow': True}) [0.4071s] [100%]
2025-12-04T10:35:20.6250131Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6251198Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6251563Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6251950Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6252347Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6252803Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6253308Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6253813Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6254311Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6254800Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6255179Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6255835Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6256296Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6256762Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6257268Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6257761Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6258217Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6258636Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6259083Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6259486Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6259911Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6260604Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6261193Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6261775Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6262231Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6262647Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6263041Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6263463Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6263849Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6264266Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6264761Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6265183Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6265625Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6266132Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6266634Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6267150Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6267582Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6267973Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6268466Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6268854Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6269381Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6269849Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6270452Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6270951Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6271387Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6272030Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6272340Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6274585Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6275054Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6275949Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6276537Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6277294Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6277878Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6278628Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6279345Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6279871Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6280944Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6281335Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6282108Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6282214Z FAILED [0.4077s] [100%]
2025-12-04T10:35:20.6282219Z 
2025-12-04T10:35:20.6282345Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.6282694Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6282799Z Traceback (most recent call last):
2025-12-04T10:35:20.6283176Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6283390Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6283848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6284071Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6284526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6284697Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6285151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6285277Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6285742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6286038Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6286497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6286638Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6287054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6287155Z     return self._compile_to_module()
2025-12-04T10:35:20.6287628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6287774Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6288221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6288343Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6288768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6288985Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6289524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6289632Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6290086Z   File "/tmp/tmplaf0delm/tx/ctxm7ilb5wrqs7qgfxksua4p4sl66noiuw7no2bc37qrpem5z4bc.py", line 137, in <module>
2025-12-04T10:35:20.6290488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6290596Z     kernel.precompile(
2025-12-04T10:35:20.6291079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6291178Z     self._precompile_worker()
2025-12-04T10:35:20.6291704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6291900Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6292419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6292601Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6292984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6293210Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6293590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6293886Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6294095Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6294843Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6294925Z ^
2025-12-04T10:35:20.6295320Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6295325Z 
2025-12-04T10:35:20.6295940Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6295945Z 
2025-12-04T10:35:20.6295958Z 
2025-12-04T10:35:20.6296141Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6296887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6296896Z 
2025-12-04T10:35:20.6297142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6297334Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6297418Z frames [('total', 1)]
2025-12-04T10:35:20.6297526Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6297932Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6298177Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6298258Z graph_break []
2025-12-04T10:35:20.6298593Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6298711Z Traceback (most recent call last):
2025-12-04T10:35:20.6299109Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6299316Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6299748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6300014Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6300467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6300633Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6301080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6301213Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6301674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6301969Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6302460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6302588Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6303011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6303117Z     return self._compile_to_module()
2025-12-04T10:35:20.6303547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6303688Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6304131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6304249Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6304718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6304919Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6305437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6305550Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6306017Z   File "/tmp/tmprhc1pp3a/gk/cgkk7yqjytg6b4cjserdwc3fycwq2i4pemvugnq7ym5of5cfywkh.py", line 137, in <module>
2025-12-04T10:35:20.6306418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6306515Z     kernel.precompile(
2025-12-04T10:35:20.6306994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6307094Z     self._precompile_worker()
2025-12-04T10:35:20.6307622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6308018Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6308620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6308802Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6309272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6309478Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6309859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6310142Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6310344Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6311038Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6311168Z ^
2025-12-04T10:35:20.6311565Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6311570Z 
2025-12-04T10:35:20.6312184Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6312189Z 
2025-12-04T10:35:20.6312193Z 
2025-12-04T10:35:20.6312383Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6313131Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6313191Z 
2025-12-04T10:35:20.6313425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6313613Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6313698Z frames [('total', 1)]
2025-12-04T10:35:20.6313804Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6314204Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6314390Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6314479Z graph_break []
2025-12-04T10:35:20.6314658Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6314746Z frames [('total', 1)]
2025-12-04T10:35:20.6314845Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6315031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6315496Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6315578Z graph_break []
2025-12-04T10:35:20.6315698Z =================================== FAILURES ===================================
2025-12-04T10:35:20.6316041Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6316147Z Traceback (most recent call last):
2025-12-04T10:35:20.6316523Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6316725Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6317140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6317359Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6317800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6317962Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6318405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6318524Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6319031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6319305Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6319748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6319881Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6320292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6320400Z     return self._compile_to_module()
2025-12-04T10:35:20.6320879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6321019Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6321472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6321583Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6322004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6322209Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6322710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6322867Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6323284Z   File "/tmp/tmpk_2vk0w0/g5/cg5n4digy4cyupl5slyiujzzyjq77i2preqmwg6r4th3vq7cqwqd.py", line 137, in <module>
2025-12-04T10:35:20.6323682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6323783Z     kernel.precompile(
2025-12-04T10:35:20.6324257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6324364Z     self._precompile_worker()
2025-12-04T10:35:20.6324879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6325031Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6325546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6325763Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6326152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6326379Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6326759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6327055Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6327248Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6327941Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6328030Z ^
2025-12-04T10:35:20.6328432Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6328436Z 
2025-12-04T10:35:20.6329058Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6329063Z 
2025-12-04T10:35:20.6329067Z 
2025-12-04T10:35:20.6329258Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6330067Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6330073Z 
2025-12-04T10:35:20.6330301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6330485Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6330583Z frames [('total', 1)]
2025-12-04T10:35:20.6330688Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6331107Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6331339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6331428Z graph_break []
2025-12-04T10:35:20.6331622Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6331708Z frames [('total', 1)]
2025-12-04T10:35:20.6331806Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6332005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6332411Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6332497Z graph_break []
2025-12-04T10:35:20.6332687Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6332898Z frames [('total', 1)]
2025-12-04T10:35:20.6333004Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6333191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6333592Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6333683Z graph_break []
2025-12-04T10:35:20.6334246Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml -
2025-12-04T10:35:20.6334400Z =========================== short test summary info ============================
2025-12-04T10:35:20.6335127Z FAILED [0.4077s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6335894Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6335996Z ^
2025-12-04T10:35:20.6336401Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6336406Z 
2025-12-04T10:35:20.6337024Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6337028Z 
2025-12-04T10:35:20.6337035Z 
2025-12-04T10:35:20.6337219Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6338080Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6338093Z 
2025-12-04T10:35:20.6338323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6338481Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.6338658Z ================== 1 failed, 187 deselected, 2 rerun in 2.75s ==================
2025-12-04T10:35:20.6338740Z Got exit code 1
2025-12-04T10:35:20.6338827Z Retrying single test...
2025-12-04T10:35:20.6339293Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml
2025-12-04T10:35:20.6339487Z ============================= test session starts ==============================
2025-12-04T10:35:20.6339788Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.6339878Z cachedir: .pytest_cache
2025-12-04T10:35:20.6340323Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.6340426Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.6340521Z configfile: pytest.ini
2025-12-04T10:35:20.6340988Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.6341227Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.6341905Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6342004Z Running 1 items in this shard
2025-12-04T10:35:20.6342011Z 
2025-12-04T10:35:20.6343243Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6344313Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6344721Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6345099Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6345494Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6345985Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6346470Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6347003Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6347504Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6347981Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6348360Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6348913Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6349355Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6349820Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6350321Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6350770Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6351269Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6351685Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6352096Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6352500Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6352970Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6353623Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6354208Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6354796Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6355288Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6355698Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6356089Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6356506Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6356891Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6357295Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6357752Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6358216Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6358664Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6359182Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6359678Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6360165Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6360588Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6360984Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6361505Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6361896Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6362473Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6362936Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6363550Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6364054Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6364539Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6365157Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6365472Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6367778Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6368281Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6369190Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6369765Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6370531Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6371125Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6371883Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6372558Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6373082Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6374158Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6374510Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6375281Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6375393Z ('RERUN', {'yellow': True}) [1.8776s] [100%]
2025-12-04T10:35:20.6376679Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6377799Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6378161Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6378546Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6378933Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6379481Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6379941Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6380434Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6380943Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6381511Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6381901Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6382497Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6382952Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6383419Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6383912Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6384380Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6384833Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6385262Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6385715Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6386134Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6386607Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6387252Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6387846Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6388426Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6388911Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6389328Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6389710Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6390133Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6390510Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6390961Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6391419Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6391829Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6392273Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6392778Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6393267Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6393788Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6394212Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6394606Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6395093Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6395477Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6395969Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6396428Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6397035Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6397520Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6398000Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6398695Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6398998Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6401264Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6401774Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6402717Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6403256Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6404024Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6404602Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6405426Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6406138Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6406670Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6408029Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6408424Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6409200Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6409314Z ('RERUN', {'yellow': True}) [0.4079s] [100%]
2025-12-04T10:35:20.6410633Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1
2025-12-04T10:35:20.6411699Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6412076Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:20.6412459Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 5120
2025-12-04T10:35:20.6412909Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6413372Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6413831Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6414335Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6414827Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6415356Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6415750Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6416290Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6416742Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp15 = tl.load(in_ptr3 + (0))
2025-12-04T10:35:20.6417207Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp16 = tl.broadcast_to(tmp15, [1, 1])
2025-12-04T10:35:20.6417758Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6418212Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6418665Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6419135Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6419542Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6419943Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_2 = r0_index
2025-12-04T10:35:20.6420368Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index // 512
2025-12-04T10:35:20.6421016Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6421607Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6422233Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0)
2025-12-04T10:35:20.6422690Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6423099Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3 = tmp1 - tmp2
2025-12-04T10:35:20.6423484Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp5 = 512.0
2025-12-04T10:35:20.6423906Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp6 = (tmp4 / tmp5)
2025-12-04T10:35:20.6424332Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp7 = 1e-05
2025-12-04T10:35:20.6424746Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp8 = tmp6 + tmp7
2025-12-04T10:35:20.6425198Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = libdevice.rsqrt(tmp8)
2025-12-04T10:35:20.6425633Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp3 * tmp9
2025-12-04T10:35:20.6426112Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tl_math.abs(tmp10)
2025-12-04T10:35:20.6426659Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6427155Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = triton_helpers.maximum(_tmp13, tmp12)
2025-12-04T10:35:20.6427627Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp13 = tl.where(r0_mask, tmp14, _tmp13)
2025-12-04T10:35:20.6428056Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp10 * tmp16
2025-12-04T10:35:20.6428451Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = -448.0
2025-12-04T10:35:20.6428935Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = triton_helpers.maximum(tmp17, tmp18)
2025-12-04T10:35:20.6429375Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp20 = 448.0
2025-12-04T10:35:20.6429866Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.minimum(tmp19, tmp20)
2025-12-04T10:35:20.6430332Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp22 = tmp21.to(tl.float8e4nv)
2025-12-04T10:35:20.6430926Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask)
2025-12-04T10:35:20.6431416Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp13 = triton_helpers.max2(_tmp13, 1)[:, None]
2025-12-04T10:35:20.6431853Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tmp13.to(tl.float32)
2025-12-04T10:35:20.6432454Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None)
2025-12-04T10:35:20.6432762Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6435041Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6435561Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6436499Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6437042Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6437797Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6438428Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6439177Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6439833Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6440354Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6441456Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6441772Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6442530Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6442623Z FAILED [0.4078s] [100%]
2025-12-04T10:35:20.6442628Z 
2025-12-04T10:35:20.6442746Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.6443081Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6443189Z Traceback (most recent call last):
2025-12-04T10:35:20.6443546Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6443748Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6444172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6444387Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6444833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6445037Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6445478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6445605Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6446060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6451223Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6451704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6451904Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6452325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6452426Z     return self._compile_to_module()
2025-12-04T10:35:20.6452847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6452993Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6453438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6453551Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6454057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6454257Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6454766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6454878Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6455346Z   File "/tmp/tmpezxridec/yn/cyntqhsjvsr3mrcrbdyk6euuldxkncwn3sh3lvcq3ku2l5nnwg7t.py", line 137, in <module>
2025-12-04T10:35:20.6455745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6455839Z     kernel.precompile(
2025-12-04T10:35:20.6456319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6456419Z     self._precompile_worker()
2025-12-04T10:35:20.6456970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6457130Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6457638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6457813Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6458199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6458405Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6458784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6459136Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6459365Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6460112Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6460188Z ^
2025-12-04T10:35:20.6460615Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6460621Z 
2025-12-04T10:35:20.6461320Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6461325Z 
2025-12-04T10:35:20.6461329Z 
2025-12-04T10:35:20.6461516Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6462262Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6462269Z 
2025-12-04T10:35:20.6462542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6462723Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6462811Z frames [('total', 1)]
2025-12-04T10:35:20.6462919Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6463324Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6463513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6463604Z graph_break []
2025-12-04T10:35:20.6463940Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6464048Z Traceback (most recent call last):
2025-12-04T10:35:20.6464452Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6464647Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6465068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6465275Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6465712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6465881Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6466311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6466438Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6466886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6467198Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6467649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6467769Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6468186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6468288Z     return self._compile_to_module()
2025-12-04T10:35:20.6468698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6468839Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6469280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6469393Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6469823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6470017Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6470519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6470622Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6471116Z   File "/tmp/tmpwgnsy2ge/p2/cp2slbvsckyo36qdautgtbjhvb7obiosuimpvfzy66mivlarwy4b.py", line 137, in <module>
2025-12-04T10:35:20.6471514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6471612Z     kernel.precompile(
2025-12-04T10:35:20.6472091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6472196Z     self._precompile_worker()
2025-12-04T10:35:20.6472702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6472900Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6473404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6473572Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6473957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6474164Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6474544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6474827Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6475063Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6475860Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6475934Z ^
2025-12-04T10:35:20.6476335Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6476343Z 
2025-12-04T10:35:20.6476954Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6476959Z 
2025-12-04T10:35:20.6476964Z 
2025-12-04T10:35:20.6477151Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6477941Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6477952Z 
2025-12-04T10:35:20.6478176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6478369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6478456Z frames [('total', 1)]
2025-12-04T10:35:20.6478552Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6478957Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6479149Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6479234Z graph_break []
2025-12-04T10:35:20.6479415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6479498Z frames [('total', 1)]
2025-12-04T10:35:20.6479607Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6479797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6480197Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6480287Z graph_break []
2025-12-04T10:35:20.6480406Z =================================== FAILURES ===================================
2025-12-04T10:35:20.6480746Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _
2025-12-04T10:35:20.6480889Z Traceback (most recent call last):
2025-12-04T10:35:20.6481251Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6481459Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6481878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6482099Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6482538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6482743Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6483186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6483305Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6483762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6484042Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6484488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6484664Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6485083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6485186Z     return self._compile_to_module()
2025-12-04T10:35:20.6485609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6485745Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6486189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6486296Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6486714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6486916Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6487461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6487571Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6488029Z   File "/tmp/tmpm00rcv7e/yj/cyjmet6qihmitx5eqm2uuxqdg5ogetcaau2encawtmy5j6p7ntr2.py", line 137, in <module>
2025-12-04T10:35:20.6488423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6488524Z     kernel.precompile(
2025-12-04T10:35:20.6489001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6489098Z     self._precompile_worker()
2025-12-04T10:35:20.6489614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6489762Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6490284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6490451Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6490833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6491047Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6491462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6491747Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6491951Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6492649Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6492732Z ^
2025-12-04T10:35:20.6493127Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6493198Z 
2025-12-04T10:35:20.6493805Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6493816Z 
2025-12-04T10:35:20.6493820Z 
2025-12-04T10:35:20.6494005Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6494750Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6494755Z 
2025-12-04T10:35:20.6494990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6495212Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6495303Z frames [('total', 1)]
2025-12-04T10:35:20.6495398Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6495798Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6495984Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6496061Z graph_break []
2025-12-04T10:35:20.6496243Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6496333Z frames [('total', 1)]
2025-12-04T10:35:20.6496428Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6496614Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6497010Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6497087Z graph_break []
2025-12-04T10:35:20.6497313Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6497398Z frames [('total', 1)]
2025-12-04T10:35:20.6497491Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6497680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6498069Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T10:35:20.6498158Z graph_break []
2025-12-04T10:35:20.6498722Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml -
2025-12-04T10:35:20.6498869Z =========================== short test summary info ============================
2025-12-04T10:35:20.6499661Z FAILED [0.4078s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6500352Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6500436Z ^
2025-12-04T10:35:20.6500828Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6500833Z 
2025-12-04T10:35:20.6501484Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6501489Z 
2025-12-04T10:35:20.6501492Z 
2025-12-04T10:35:20.6501679Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6502421Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6502428Z 
2025-12-04T10:35:20.6502660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6502812Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.6503019Z ================== 1 failed, 187 deselected, 2 rerun in 2.73s ==================
2025-12-04T10:35:20.6503107Z Got exit code 1
2025-12-04T10:35:20.6503639Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda
2025-12-04T10:35:20.6503995Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.6504398Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml
2025-12-04T10:35:20.6504535Z ============================= test session starts ==============================
2025-12-04T10:35:20.6504839Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.6504974Z cachedir: .pytest_cache
2025-12-04T10:35:20.6505428Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.6505531Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.6505624Z configfile: pytest.ini
2025-12-04T10:35:20.6506094Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.6506290Z collecting ... collected 188 items / 36 deselected / 152 selected
2025-12-04T10:35:20.6506412Z stepcurrent: skipping 36 already run items.
2025-12-04T10:35:20.6506526Z Running 152 items in this shard
2025-12-04T10:35:20.6506530Z 
2025-12-04T10:35:20.6507985Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6509072Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6509460Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6509859Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6510255Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6510716Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6511200Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6511699Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6512215Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6512773Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6513168Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6513539Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6514052Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6514709Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6515222Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6515720Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6516178Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6516633Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6517131Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6517539Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6517947Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6518615Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6519060Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6519568Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6520247Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6520782Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6521120Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6521641Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6522160Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6522715Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6523336Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6523746Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6524197Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6524602Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6525141Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6525612Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6526134Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6527391Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6527850Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6528310Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6528738Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6529190Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6529594Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6530272Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6530730Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6531160Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6531551Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6532027Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6532415Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6532834Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6533295Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6533711Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6534155Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6534655Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6535150Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6535629Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6536100Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6536562Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6537052Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6537441Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6537931Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6538428Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6538934Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6539490Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6539965Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6540264Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6542324Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6542781Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6543718Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6544251Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6545006Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6545589Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6546341Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6547010Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6547529Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6548509Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6548815Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6549575Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6549695Z ('RERUN', {'yellow': True}) [1.8138s] [  0%]
2025-12-04T10:35:20.6550857Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6551833Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6552204Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6552583Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6553012Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6553467Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6553933Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6554430Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6554933Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6555399Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6555854Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6556235Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6556740Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6557246Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6557756Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6558249Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6558705Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6559152Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6559571Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6560020Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6560424Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6561085Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6561533Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6562076Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6562687Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6563198Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6563532Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6564053Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6564594Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6565143Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6565754Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6566208Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6566618Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6567055Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6567592Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6568046Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6568510Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6569004Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6569452Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6569903Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6570322Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6570726Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6571171Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6571835Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6572284Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6572708Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6573094Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6573564Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6573945Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6574366Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6574828Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6575251Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6575743Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6576243Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6576743Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6577220Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6577642Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6578040Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6578603Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6578996Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6579535Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6579995Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6580500Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6580986Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6581458Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6581760Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6583862Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6584324Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6585261Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6585795Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6586558Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6587148Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6587937Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6588606Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6589125Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6590065Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6590418Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6591183Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6591305Z ('RERUN', {'yellow': True}) [0.3427s] [  0%]
2025-12-04T10:35:20.6592468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6593403Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6593783Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6594179Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6594562Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6595063Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6595531Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6596068Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6596576Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6597086Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6597467Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6597834Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6598342Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6598850Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6599401Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6599896Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6600350Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6600801Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6601215Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6601636Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6602091Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6602765Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6603220Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6603736Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6604352Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6604890Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6605233Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6605798Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6606360Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6606923Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6607537Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6608237Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6608679Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6609175Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6609725Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6610189Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6610660Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6611171Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6611685Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6612135Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6612577Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6612987Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6613399Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6614120Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6614593Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6615030Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6615431Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6615868Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6616268Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6616701Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6617161Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6617584Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6618049Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6618607Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6619163Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6619647Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6620073Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6620537Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6621032Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6621433Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6621920Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6622380Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6622974Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6623473Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6623961Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6624273Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6626401Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6626876Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6627788Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6628329Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6629109Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6629696Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6630502Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6631171Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6631697Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6632645Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6632999Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6633782Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6633872Z FAILED [0.3429s] [  0%]
2025-12-04T10:35:20.6633878Z 
2025-12-04T10:35:20.6634001Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.6634366Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.6634514Z Traceback (most recent call last):
2025-12-04T10:35:20.6634878Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6635090Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6635512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6635768Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6636213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6636375Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6636817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6636941Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6637445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6637720Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6638172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6638301Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6638710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6638813Z     return self._compile_to_module()
2025-12-04T10:35:20.6639226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6639364Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6639818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6639930Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6640350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6640550Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6641053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6641206Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6641646Z   File "/tmp/tmpda4qg5z6/dc/cdclxrasc7tnmn2qdxjuzbb62bszhhlc4uedhzudqv2wqb7b3uhc.py", line 65, in <module>
2025-12-04T10:35:20.6642043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6642144Z     kernel.precompile(
2025-12-04T10:35:20.6642620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6642723Z     self._precompile_worker()
2025-12-04T10:35:20.6643283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6643433Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6643948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6644117Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6644501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6644722Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6645099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6645447Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6645644Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6646204Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6646286Z ^
2025-12-04T10:35:20.6646685Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6646690Z 
2025-12-04T10:35:20.6647308Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6647313Z 
2025-12-04T10:35:20.6647317Z 
2025-12-04T10:35:20.6647502Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6648301Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6648315Z 
2025-12-04T10:35:20.6648546Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6648734Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6648828Z frames [('total', 1)]
2025-12-04T10:35:20.6648929Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6649346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6649548Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6649633Z graph_break []
2025-12-04T10:35:20.6649993Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.6650103Z Traceback (most recent call last):
2025-12-04T10:35:20.6650473Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6650683Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6651105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6651315Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6651807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6651970Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6652419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6652543Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6653011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6653304Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6653790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6653922Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6654334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6654434Z     return self._compile_to_module()
2025-12-04T10:35:20.6654855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6654991Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6655436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6655599Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6656068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6656275Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6656783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6656897Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6657349Z   File "/tmp/tmpvn380q15/qm/cqm2xrqur3j5xpu5vzdagyhepg23xyhngtfgcteeihnfpwyyneq4.py", line 65, in <module>
2025-12-04T10:35:20.6657756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6657862Z     kernel.precompile(
2025-12-04T10:35:20.6658380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6658483Z     self._precompile_worker()
2025-12-04T10:35:20.6659009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6659244Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6659756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6659938Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6660320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6660545Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6660919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6661216Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6661431Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6661993Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6662069Z ^
2025-12-04T10:35:20.6662507Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6662512Z 
2025-12-04T10:35:20.6663121Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6663126Z 
2025-12-04T10:35:20.6663136Z 
2025-12-04T10:35:20.6663318Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6664080Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6664150Z 
2025-12-04T10:35:20.6664381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6664561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6664646Z frames [('total', 1)]
2025-12-04T10:35:20.6664749Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6665150Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6665344Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6665422Z graph_break []
2025-12-04T10:35:20.6665598Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6665731Z frames [('total', 1)]
2025-12-04T10:35:20.6665825Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6666007Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6666411Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6666492Z graph_break []
2025-12-04T10:35:20.6666617Z =================================== FAILURES ===================================
2025-12-04T10:35:20.6666967Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.6667065Z Traceback (most recent call last):
2025-12-04T10:35:20.6667434Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6667629Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6668041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6668301Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6668742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6668908Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6669340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6669462Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6669919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6670195Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6670651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6670781Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6671192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6671302Z     return self._compile_to_module()
2025-12-04T10:35:20.6671715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6671863Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6672346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6672456Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6672885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6673079Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6673588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6673742Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6674184Z   File "/tmp/tmp5gxiw3nv/zm/czmvsiylcrotx6rygblxanr2ss6pekd4iffo5u36pmlqw4sfu34f.py", line 65, in <module>
2025-12-04T10:35:20.6674592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6674687Z     kernel.precompile(
2025-12-04T10:35:20.6675165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6675271Z     self._precompile_worker()
2025-12-04T10:35:20.6675782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6675978Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6676491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6676661Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6677054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6677265Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6677649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6677940Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6678137Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6678748Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6678823Z ^
2025-12-04T10:35:20.6679220Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6679227Z 
2025-12-04T10:35:20.6679842Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6679846Z 
2025-12-04T10:35:20.6679850Z 
2025-12-04T10:35:20.6680032Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6680797Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6680803Z 
2025-12-04T10:35:20.6681034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6681228Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6681315Z frames [('total', 1)]
2025-12-04T10:35:20.6681412Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6681831Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6682015Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6682098Z graph_break []
2025-12-04T10:35:20.6682332Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6682420Z frames [('total', 1)]
2025-12-04T10:35:20.6682512Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6682710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6683107Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6683193Z graph_break []
2025-12-04T10:35:20.6683376Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6683461Z frames [('total', 1)]
2025-12-04T10:35:20.6683607Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6683791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6684188Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6684273Z graph_break []
2025-12-04T10:35:20.6684837Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml -
2025-12-04T10:35:20.6684988Z =========================== short test summary info ============================
2025-12-04T10:35:20.6685721Z FAILED [0.3429s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6686320Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6686401Z ^
2025-12-04T10:35:20.6686801Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6686806Z 
2025-12-04T10:35:20.6687427Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6687432Z 
2025-12-04T10:35:20.6687436Z 
2025-12-04T10:35:20.6687625Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6688393Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6688401Z 
2025-12-04T10:35:20.6688675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6688832Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.6689021Z ================== 1 failed, 36 deselected, 2 rerun in 2.53s ===================
2025-12-04T10:35:20.6689110Z Got exit code 1
2025-12-04T10:35:20.6689210Z Retrying single test...
2025-12-04T10:35:20.6689625Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml
2025-12-04T10:35:20.6689765Z ============================= test session starts ==============================
2025-12-04T10:35:20.6690075Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.6690168Z cachedir: .pytest_cache
2025-12-04T10:35:20.6690622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.6690746Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.6690837Z configfile: pytest.ini
2025-12-04T10:35:20.6691322Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.6691515Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.6692327Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6692440Z Running 1 items in this shard
2025-12-04T10:35:20.6692445Z 
2025-12-04T10:35:20.6693618Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6694575Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6694994Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6695376Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6695788Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6696248Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6696723Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6697268Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6697781Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6698255Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6698647Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6699071Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6699583Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6700136Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6700659Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6701155Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6701624Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6702075Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6702515Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6702937Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6703339Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6704088Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6704542Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6705054Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6705677Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6706206Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6706589Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6707120Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6707632Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6708495Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6709213Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6709623Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6710025Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6710438Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6710981Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6711452Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6711983Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6712485Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6712950Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6713401Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6713843Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6714262Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6714674Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6715350Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6715823Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6716363Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6716754Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6717199Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6717605Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6718031Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6718568Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6718999Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6719457Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6719965Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6720470Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6721005Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6721428Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6721833Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6722323Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6722722Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6723262Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6723732Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6724253Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6724753Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6725241Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6725553Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6727623Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6728100Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6728996Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6729547Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6730351Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6730950Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6731707Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6732386Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6732954Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6733918Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6734240Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6735016Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6735140Z ('RERUN', {'yellow': True}) [1.8093s] [100%]
2025-12-04T10:35:20.6736357Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6737310Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6737684Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6738080Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6738485Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6738947Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6739495Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6739991Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6740548Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6741023Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6741406Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6741788Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6742338Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6742854Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6743375Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6743876Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6744346Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6744843Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6745274Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6745687Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6746141Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6746814Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6747259Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6747842Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6748469Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6748994Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6749343Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6756575Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6757159Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6757816Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6758458Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6758940Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6759345Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6759752Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6760296Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6760800Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6761269Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6761763Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6762219Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6762665Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6763137Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6763543Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6763951Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6764622Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6765078Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6765509Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6765942Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6766375Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6766767Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6767196Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6767659Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6768082Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6768541Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6769053Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6769546Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6770080Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6770500Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6770900Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6771390Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6771775Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6772314Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6772774Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6773290Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6773783Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6774256Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6774605Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6776628Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6777129Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6778033Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6778572Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6779403Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6779990Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6780747Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6781409Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6781973Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6782912Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6783226Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6783995Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6784152Z ('RERUN', {'yellow': True}) [0.3439s] [100%]
2025-12-04T10:35:20.6785324Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6786266Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6786644Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6787064Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6787457Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6787917Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6788383Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6788877Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6789381Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6789902Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6790287Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6790659Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6791168Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6791668Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6792182Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6792681Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6793134Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6793580Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6794045Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6794449Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6794842Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6795509Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6795996Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6796500Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6797111Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6797631Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6797972Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6798561Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6799064Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6799617Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6800222Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6800629Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6801075Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6801480Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6802024Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6802484Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6802946Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6803434Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6803897Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6804353Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6804779Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6805232Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6805633Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6806351Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6806804Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6807301Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6807692Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6808455Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6808849Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6809268Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6809733Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6810243Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6810696Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6811201Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6811693Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6812175Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6812593Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6813049Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6813542Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6813925Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6814423Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6814878Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6815389Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6815887Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6816360Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6816660Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6818737Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6819305Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6820200Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6820740Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6821500Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6822125Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6822876Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6823602Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6824182Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6825180Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6825494Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6826258Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6826347Z FAILED [0.3421s] [100%]
2025-12-04T10:35:20.6826353Z 
2025-12-04T10:35:20.6826471Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.6826822Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.6826924Z Traceback (most recent call last):
2025-12-04T10:35:20.6827283Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6827494Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6827909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6828123Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6828568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6828774Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6829214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6829335Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6829790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6830071Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6830516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6830687Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6831093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6831196Z     return self._compile_to_module()
2025-12-04T10:35:20.6831612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6831746Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6832183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6832297Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6832763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6832968Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6833471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6833575Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6834018Z   File "/tmp/tmp0waadcb3/lt/cltt5eksho3vm3dp6rgm62r2zcrl2k3djay2ye2ud5knou7ih2ln.py", line 65, in <module>
2025-12-04T10:35:20.6834417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6834516Z     kernel.precompile(
2025-12-04T10:35:20.6834989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6835087Z     self._precompile_worker()
2025-12-04T10:35:20.6835643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6835796Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6836306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6836476Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6836862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6837070Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6837443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6837722Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6837925Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6838478Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6838557Z ^
2025-12-04T10:35:20.6839073Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6839078Z 
2025-12-04T10:35:20.6839738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6839750Z 
2025-12-04T10:35:20.6839754Z 
2025-12-04T10:35:20.6839938Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6840698Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6840705Z 
2025-12-04T10:35:20.6840934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6841183Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6841274Z frames [('total', 1)]
2025-12-04T10:35:20.6841372Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6841782Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6841982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6842065Z graph_break []
2025-12-04T10:35:20.6842410Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.6842520Z Traceback (most recent call last):
2025-12-04T10:35:20.6842885Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6843129Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6843552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6843766Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6844210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6844380Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6844817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6844946Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6845401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6845727Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6846175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6846301Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6846719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6846818Z     return self._compile_to_module()
2025-12-04T10:35:20.6847239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6847374Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6847816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6847928Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6848351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6848544Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6849056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6849163Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6849678Z   File "/tmp/tmpuzx4q8fc/qg/cqgu4noh5scrxf2qf3gkpdw36bu5clypbzntfxxutzenjieujlo2.py", line 65, in <module>
2025-12-04T10:35:20.6850074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6850163Z     kernel.precompile(
2025-12-04T10:35:20.6850638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6850738Z     self._precompile_worker()
2025-12-04T10:35:20.6851255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6851447Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6851953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6852129Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6852513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6852720Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6853100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6853381Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6853623Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6854178Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6854250Z ^
2025-12-04T10:35:20.6854644Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6854649Z 
2025-12-04T10:35:20.6855255Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6855260Z 
2025-12-04T10:35:20.6855264Z 
2025-12-04T10:35:20.6855453Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6856295Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6856303Z 
2025-12-04T10:35:20.6856535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6856721Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6856805Z frames [('total', 1)]
2025-12-04T10:35:20.6856909Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6857314Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6857498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6857583Z graph_break []
2025-12-04T10:35:20.6857759Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6857851Z frames [('total', 1)]
2025-12-04T10:35:20.6857943Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6858131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6858545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6858625Z graph_break []
2025-12-04T10:35:20.6858747Z =================================== FAILURES ===================================
2025-12-04T10:35:20.6859158Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.6859257Z Traceback (most recent call last):
2025-12-04T10:35:20.6859671Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.6859868Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.6860282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.6860503Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.6860951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.6861153Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.6861597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.6861714Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.6862174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.6862447Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.6862888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.6863013Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.6863467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.6863569Z     return self._compile_to_module()
2025-12-04T10:35:20.6863988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.6864126Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.6864577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.6864682Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.6865101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.6865299Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.6865846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.6866005Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.6866434Z   File "/tmp/tmpto3tdc64/hc/chcj4h7nlexnlwy5u3m3zrqjy52nrim6jdsq5kw67oriq3by3id7.py", line 65, in <module>
2025-12-04T10:35:20.6866829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.6866925Z     kernel.precompile(
2025-12-04T10:35:20.6867399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.6867499Z     self._precompile_worker()
2025-12-04T10:35:20.6868007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.6868161Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.6868681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6868849Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6869234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6869451Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6869826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6870158Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6870354Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6870913Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6870993Z ^
2025-12-04T10:35:20.6871390Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6871395Z 
2025-12-04T10:35:20.6872013Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6872061Z 
2025-12-04T10:35:20.6872065Z 
2025-12-04T10:35:20.6872245Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6873014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6873019Z 
2025-12-04T10:35:20.6873246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6873426Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6873558Z frames [('total', 1)]
2025-12-04T10:35:20.6873659Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6874059Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6874261Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6874340Z graph_break []
2025-12-04T10:35:20.6874526Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6874609Z frames [('total', 1)]
2025-12-04T10:35:20.6874709Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6874895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6875293Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6875371Z graph_break []
2025-12-04T10:35:20.6875555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.6875640Z frames [('total', 1)]
2025-12-04T10:35:20.6875869Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.6876052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.6876445Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.6876531Z graph_break []
2025-12-04T10:35:20.6877093Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml -
2025-12-04T10:35:20.6877243Z =========================== short test summary info ============================
2025-12-04T10:35:20.6877986Z FAILED [0.3421s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.6878541Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6878619Z ^
2025-12-04T10:35:20.6879014Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6879021Z 
2025-12-04T10:35:20.6879636Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.6879641Z 
2025-12-04T10:35:20.6879645Z 
2025-12-04T10:35:20.6879893Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.6880646Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6880651Z 
2025-12-04T10:35:20.6880884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.6881039Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.6881212Z ================== 1 failed, 187 deselected, 2 rerun in 2.53s ==================
2025-12-04T10:35:20.6881338Z Got exit code 1
2025-12-04T10:35:20.6881426Z Retrying single test...
2025-12-04T10:35:20.6881831Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml
2025-12-04T10:35:20.6881966Z ============================= test session starts ==============================
2025-12-04T10:35:20.6882264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.6882363Z cachedir: .pytest_cache
2025-12-04T10:35:20.6882808Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.6882911Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.6883044Z configfile: pytest.ini
2025-12-04T10:35:20.6883506Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.6883704Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.6884396Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.6884495Z Running 1 items in this shard
2025-12-04T10:35:20.6884500Z 
2025-12-04T10:35:20.6885668Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6886667Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6887054Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6887436Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6887829Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6888281Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6888750Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6889248Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6889745Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6890228Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6890650Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6891020Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6891526Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6892020Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6892538Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6893069Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6893525Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6893973Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6894384Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6894797Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6895231Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6895899Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6896349Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6896856Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6897472Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6898025Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6898370Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6898887Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6899455Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6900003Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6900603Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6901017Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6901422Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6901830Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6902420Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6902872Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6903336Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6903832Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6904337Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6904781Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6905206Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6905618Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6906014Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6906727Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6907176Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6907602Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6908324Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6908778Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6909171Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6909681Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6910149Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6910567Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6911013Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6911523Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6912012Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6912500Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6912923Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6913311Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6913869Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6914256Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6914750Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6915215Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6915725Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6916272Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6916735Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6917044Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6919058Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6919587Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6920476Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6921088Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6921848Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6922429Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6923179Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6923836Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6924366Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6925297Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6925609Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6926415Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6926535Z ('RERUN', {'yellow': True}) [1.7976s] [100%]
2025-12-04T10:35:20.6927697Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6928675Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6929047Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6929423Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6929816Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6930268Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6930856Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6931351Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6931846Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6932320Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6932695Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6933062Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6933612Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6934110Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6934625Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6935116Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6935571Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6936023Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6936436Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6936850Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6937242Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6937946Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6938389Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6938889Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6939547Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6940108Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6940451Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6940967Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6941469Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6942058Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6942658Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6943065Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6943466Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6943869Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6944403Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6944897Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6945366Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6945853Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6946308Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6946751Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6947168Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6947574Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6947967Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6948637Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6949124Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6949559Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6949944Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6950373Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6950802Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6951219Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6951688Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6952110Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6952551Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6953059Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6953590Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6954075Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6954497Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6954893Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6955378Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6955807Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6956307Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6956763Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6957272Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6957763Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6958222Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.6958534Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.6960580Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.6961044Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.6961938Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.6962526Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.6963284Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.6963869Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.6964620Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.6965350Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.6965914Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.6966864Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6967173Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.6967979Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.6968096Z ('RERUN', {'yellow': True}) [0.3398s] [100%]
2025-12-04T10:35:20.6969255Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0
2025-12-04T10:35:20.6970188Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.6970565Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 8192
2025-12-04T10:35:20.6970945Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_numel = 4096
2025-12-04T10:35:20.6971336Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rnumel = r0_numel
2025-12-04T10:35:20.6971794Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     RBLOCK: tl.constexpr = R0_BLOCK
2025-12-04T10:35:20.6972253Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.6972790Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
2025-12-04T10:35:20.6973283Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:, None]
2025-12-04T10:35:20.6973760Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     r0_base = tl.arange(0, R0_BLOCK)[None, :]
2025-12-04T10:35:20.6974142Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     rbase = r0_base
2025-12-04T10:35:20.6974557Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.6975065Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6975565Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6976133Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32)
2025-12-04T10:35:20.6976619Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6977113Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6977570Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6977992Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6978399Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6978789Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6979563Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6980011Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.6980518Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6981132Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce(
2025-12-04T10:35:20.6981645Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]             tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0
2025-12-04T10:35:20.6981988Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         )
2025-12-04T10:35:20.6982512Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean)
2025-12-04T10:35:20.6983017Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2)
2025-12-04T10:35:20.6983566Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight)
2025-12-04T10:35:20.6984211Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1)
2025-12-04T10:35:20.6984621Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp4[:, None]
2025-12-04T10:35:20.6985022Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp7 = tmp5[:, None]
2025-12-04T10:35:20.6985434Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp8 = tmp6[:, None]
2025-12-04T10:35:20.6986011Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32)
2025-12-04T10:35:20.6986463Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp22 = tl.load(in_ptr1 + (0))
2025-12-04T10:35:20.6986930Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp23 = tl.broadcast_to(tmp22, [1, 1])
2025-12-04T10:35:20.6987419Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     for r0_offset in tl.range(0, r0_numel, R0_BLOCK):
2025-12-04T10:35:20.6987877Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_index = r0_offset + r0_base
2025-12-04T10:35:20.6988371Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_mask = r0_index < r0_numel
2025-12-04T10:35:20.6988791Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         roffset = r0_offset
2025-12-04T10:35:20.6989197Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         rindex = r0_index
2025-12-04T10:35:20.6989589Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         r0_1 = r0_index
2025-12-04T10:35:20.6990254Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32)
2025-12-04T10:35:20.6990699Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp10 = tmp9.to(tl.float32)
2025-12-04T10:35:20.6991166Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp11 = tmp10 - tmp3
2025-12-04T10:35:20.6991555Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp12 = 4096.0
2025-12-04T10:35:20.6991980Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp13 = (tmp7 / tmp12)
2025-12-04T10:35:20.6992376Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp14 = 1e-05
2025-12-04T10:35:20.6992792Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp15 = tmp13 + tmp14
2025-12-04T10:35:20.6993252Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp16 = libdevice.rsqrt(tmp15)
2025-12-04T10:35:20.6993672Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp17 = tmp11 * tmp16
2025-12-04T10:35:20.6994114Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp18 = tl_math.abs(tmp17)
2025-12-04T10:35:20.6994624Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK])
2025-12-04T10:35:20.6995158Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp21 = triton_helpers.maximum(_tmp20, tmp19)
2025-12-04T10:35:20.6995635Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         _tmp20 = tl.where(r0_mask, tmp21, _tmp20)
2025-12-04T10:35:20.6996059Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp24 = tmp17 * tmp23
2025-12-04T10:35:20.6996463Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp25 = -448.0
2025-12-04T10:35:20.6996954Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp26 = triton_helpers.maximum(tmp24, tmp25)
2025-12-04T10:35:20.6997384Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp27 = 448.0
2025-12-04T10:35:20.6997879Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp28 = triton_helpers.minimum(tmp26, tmp27)
2025-12-04T10:35:20.6998345Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tmp29 = tmp28.to(tl.float8e4nv)
2025-12-04T10:35:20.6998858Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]         tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask)
2025-12-04T10:35:20.6999346Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp20 = triton_helpers.max2(_tmp20, 1)[:, None]
2025-12-04T10:35:20.6999848Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr2 + (x0), tmp20, None)
2025-12-04T10:35:20.7000156Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.7002201Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7002665Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.7003560Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7004109Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7004863Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7005447Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7006251Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7006954Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7007472Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7008652Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.7008970Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:20.7009839Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7009929Z FAILED [0.3395s] [100%]
2025-12-04T10:35:20.7009934Z 
2025-12-04T10:35:20.7010054Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7010401Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7010513Z Traceback (most recent call last):
2025-12-04T10:35:20.7010871Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.7011072Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.7011550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7011762Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7012203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7012369Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7012813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7012931Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7013391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7013669Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7014168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7014291Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7014706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7014803Z     return self._compile_to_module()
2025-12-04T10:35:20.7015220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7015357Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7015800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7015911Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7016333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7016541Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7017038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7017143Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7017581Z   File "/tmp/tmpdu0qqvj8/bp/cbpkoaotyt6w3t6nhfvncbru7hq5du56hssi4mo7kfhvs2wz4oly.py", line 65, in <module>
2025-12-04T10:35:20.7018037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7018127Z     kernel.precompile(
2025-12-04T10:35:20.7018604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7018697Z     self._precompile_worker()
2025-12-04T10:35:20.7019255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7019406Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7019910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7020122Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7020501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7020712Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7021085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7021367Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7021563Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7022163Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.7022236Z ^
2025-12-04T10:35:20.7022637Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7022642Z 
2025-12-04T10:35:20.7023247Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7023253Z 
2025-12-04T10:35:20.7023257Z 
2025-12-04T10:35:20.7023446Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7024202Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7024209Z 
2025-12-04T10:35:20.7024479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7024659Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7024746Z frames [('total', 1)]
2025-12-04T10:35:20.7024841Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.7025245Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7025435Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.7025515Z graph_break []
2025-12-04T10:35:20.7025861Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7025966Z Traceback (most recent call last):
2025-12-04T10:35:20.7026321Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.7026512Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.7026931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7027141Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7027584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7027748Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7028224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7028353Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7028815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7029084Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7029540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7029708Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7030122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7030223Z     return self._compile_to_module()
2025-12-04T10:35:20.7030639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7030777Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7031218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7031330Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7031749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7031985Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7032495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7032603Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7033019Z   File "/tmp/tmp253oyr_d/sd/csdy5hvu45hpw625y3fiiuwr7p4dczxtmhsvf47xxu3eiw4tjv7f.py", line 65, in <module>
2025-12-04T10:35:20.7033423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7033516Z     kernel.precompile(
2025-12-04T10:35:20.7034001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7034096Z     self._precompile_worker()
2025-12-04T10:35:20.7034650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7034805Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7035315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7035489Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7035897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7036129Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7036509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7036801Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7036997Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7037554Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.7037628Z ^
2025-12-04T10:35:20.7038023Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7038028Z 
2025-12-04T10:35:20.7038680Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7038685Z 
2025-12-04T10:35:20.7038689Z 
2025-12-04T10:35:20.7038879Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7039635Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7039642Z 
2025-12-04T10:35:20.7039872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7040057Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7040178Z frames [('total', 1)]
2025-12-04T10:35:20.7040277Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.7040682Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7040871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.7040960Z graph_break []
2025-12-04T10:35:20.7041138Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7041220Z frames [('total', 1)]
2025-12-04T10:35:20.7041323Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.7041505Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.7041902Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7042025Z graph_break []
2025-12-04T10:35:20.7042142Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7042495Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7042597Z Traceback (most recent call last):
2025-12-04T10:35:20.7042955Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant
2025-12-04T10:35:20.7043158Z     y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled)
2025-12-04T10:35:20.7043572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7043788Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7044220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7044429Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7044875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7045000Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7045458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7045737Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7046236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7046365Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7046770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7046877Z     return self._compile_to_module()
2025-12-04T10:35:20.7047293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7047427Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7047871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7047973Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7048463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7048660Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7049159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7049265Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7049711Z   File "/tmp/tmp7x5nkcmi/ri/cribppqv3iczsynsh4fmdqllfgmzb7uflwk3zo7z6svfapfmas3g.py", line 65, in <module>
2025-12-04T10:35:20.7050106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7050243Z     kernel.precompile(
2025-12-04T10:35:20.7050719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7050820Z     self._precompile_worker()
2025-12-04T10:35:20.7051333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7051482Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7051988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7052148Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7052656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7052866Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7053244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7053533Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7053726Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7054280Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.7054359Z ^
2025-12-04T10:35:20.7054750Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7054758Z 
2025-12-04T10:35:20.7055407Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7055415Z 
2025-12-04T10:35:20.7055419Z 
2025-12-04T10:35:20.7055598Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7056353Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7056362Z 
2025-12-04T10:35:20.7056583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7056761Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7056851Z frames [('total', 1)]
2025-12-04T10:35:20.7056945Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.7057346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7057533Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.7061401Z graph_break []
2025-12-04T10:35:20.7061608Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7061704Z frames [('total', 1)]
2025-12-04T10:35:20.7061801Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.7061997Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.7062472Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7062553Z graph_break []
2025-12-04T10:35:20.7062741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7062830Z frames [('total', 1)]
2025-12-04T10:35:20.7062929Z stats [('calls_captured', 10)]
2025-12-04T10:35:20.7063131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.7063524Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7063653Z graph_break []
2025-12-04T10:35:20.7064215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml -
2025-12-04T10:35:20.7064360Z =========================== short test summary info ============================
2025-12-04T10:35:20.7065115Z FAILED [0.3395s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7065669Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr):
2025-12-04T10:35:20.7065791Z ^
2025-12-04T10:35:20.7066186Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7066192Z 
2025-12-04T10:35:20.7066802Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7066807Z 
2025-12-04T10:35:20.7066817Z 
2025-12-04T10:35:20.7067001Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7067761Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7067766Z 
2025-12-04T10:35:20.7067996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7068153Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7068366Z ================== 1 failed, 187 deselected, 2 rerun in 2.51s ==================
2025-12-04T10:35:20.7068458Z Got exit code 1
2025-12-04T10:35:20.7069003Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7069364Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.7069772Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml
2025-12-04T10:35:20.7069909Z ============================= test session starts ==============================
2025-12-04T10:35:20.7070213Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7070306Z cachedir: .pytest_cache
2025-12-04T10:35:20.7070760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7070868Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7070958Z configfile: pytest.ini
2025-12-04T10:35:20.7071425Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7071620Z collecting ... collected 188 items / 37 deselected / 151 selected
2025-12-04T10:35:20.7071742Z stepcurrent: skipping 37 already run items.
2025-12-04T10:35:20.7071849Z Running 151 items in this shard
2025-12-04T10:35:20.7071854Z 
2025-12-04T10:35:20.7072385Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [1.9819s] [  0%]
2025-12-04T10:35:20.7072882Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.5899s] [  1%]
2025-12-04T10:35:20.7073369Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.7384s] [  1%]
2025-12-04T10:35:20.7073862Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.7400s] [  2%]
2025-12-04T10:35:20.7074404Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.9962s] [  3%]
2025-12-04T10:35:20.7074879Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.6069s] [  3%]
2025-12-04T10:35:20.7075363Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.6624s] [  4%]
2025-12-04T10:35:20.7075847Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.9381s] [  5%]
2025-12-04T10:35:20.7076325Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.6684s] [  5%]
2025-12-04T10:35:20.7076863Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [1.0237s] [  6%]
2025-12-04T10:35:20.7077375Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [1.0386s] [  7%]
2025-12-04T10:35:20.7077891Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [0.9288s] [  7%]
2025-12-04T10:35:20.7078332Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.8715s] [  7%]
2025-12-04T10:35:20.7078337Z 
2025-12-04T10:35:20.7078461Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7078732Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7078840Z Traceback (most recent call last):
2025-12-04T10:35:20.7079227Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7079353Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7079777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7079995Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7080434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7080606Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7081039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7081156Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7081622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7081895Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7082346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7082473Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7082925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7083037Z     return self._compile_to_module()
2025-12-04T10:35:20.7083447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7083583Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7084030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7084145Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7084576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7084810Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7085307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7085418Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7085863Z   File "/tmp/tmp9avqyx1k/n7/cn7cjwsdmcygywdycdpfllorkspoj6wasj2mpbw3p5frzx6xcdqh.py", line 84, in <module>
2025-12-04T10:35:20.7086300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.7086399Z     self._wait_futures(scope)
2025-12-04T10:35:20.7086819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.7086966Z     kernel = result.result()
2025-12-04T10:35:20.7087339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.7087436Z     return self.result_fn()
2025-12-04T10:35:20.7087847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.7087956Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.7088303Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7088308Z 
2025-12-04T10:35:20.7088415Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7088515Z Traceback (most recent call last):
2025-12-04T10:35:20.7088982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7089068Z     result = job()
2025-12-04T10:35:20.7089647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7089766Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7090243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7090350Z     self._precompile_worker()
2025-12-04T10:35:20.7090861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7091012Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7091523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7091695Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7092085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7092296Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7092679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7092975Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7093131Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7093441Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7093516Z ^
2025-12-04T10:35:20.7093907Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7093912Z 
2025-12-04T10:35:20.7093916Z 
2025-12-04T10:35:20.7094539Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7094549Z 
2025-12-04T10:35:20.7094553Z 
2025-12-04T10:35:20.7094733Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7095470Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7095475Z 
2025-12-04T10:35:20.7095702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7095912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7096014Z frames [('total', 1)]
2025-12-04T10:35:20.7096129Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7096322Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7096827Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7096950Z graph_break []
2025-12-04T10:35:20.7097231Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7097332Z Traceback (most recent call last):
2025-12-04T10:35:20.7097677Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7097806Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7098222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7098442Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7098883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7099102Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7099599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7099721Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7100188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7100468Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7100913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7101043Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7101452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7101553Z     return self._compile_to_module()
2025-12-04T10:35:20.7101972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7102115Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7102561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7102675Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7103089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7103336Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7103839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7103957Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7104400Z   File "/tmp/tmpe2ov3yrl/sz/cszzy7yacw2o5jetxjtv3zrfddaibkyxxxvpfobhtqjhc5ahhbv2.py", line 84, in <module>
2025-12-04T10:35:20.7104784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.7104887Z     self._wait_futures(scope)
2025-12-04T10:35:20.7105306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.7105450Z     kernel = result.result()
2025-12-04T10:35:20.7105883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.7105978Z     return self.result_fn()
2025-12-04T10:35:20.7106393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.7106504Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.7106836Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7106841Z 
2025-12-04T10:35:20.7106960Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7107103Z Traceback (most recent call last):
2025-12-04T10:35:20.7107568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7107653Z     result = job()
2025-12-04T10:35:20.7108318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7108441Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7108913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7109012Z     self._precompile_worker()
2025-12-04T10:35:20.7109523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7109675Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7110263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7110433Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7110816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7111029Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7111406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7111695Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7111860Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7112119Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7112196Z ^
2025-12-04T10:35:20.7112585Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7112594Z 
2025-12-04T10:35:20.7112598Z 
2025-12-04T10:35:20.7113219Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7113226Z 
2025-12-04T10:35:20.7113229Z 
2025-12-04T10:35:20.7113415Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7114176Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7114181Z 
2025-12-04T10:35:20.7114426Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7114611Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7114707Z frames [('total', 1)]
2025-12-04T10:35:20.7114809Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7115002Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7115517Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7115669Z graph_break []
2025-12-04T10:35:20.7115853Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7115951Z frames [('total', 1)]
2025-12-04T10:35:20.7116048Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7116253Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7116755Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7116837Z graph_break []
2025-12-04T10:35:20.7116970Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7117311Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7117416Z Traceback (most recent call last):
2025-12-04T10:35:20.7117788Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7117914Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7118347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7118570Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7119014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7119187Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7119630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7119809Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7120275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7120559Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7121014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7121147Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7121565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7121679Z     return self._compile_to_module()
2025-12-04T10:35:20.7122099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7122245Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7122692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7122807Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7123249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7123449Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7124012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7124119Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7124559Z   File "/tmp/tmp5xc3wj4l/yl/cylb2kn5kngs6ygqehp4cszn7o7dv4palhjl66g5zmdghdtn57w2.py", line 84, in <module>
2025-12-04T10:35:20.7124954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.7125056Z     self._wait_futures(scope)
2025-12-04T10:35:20.7125490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.7125649Z     kernel = result.result()
2025-12-04T10:35:20.7126078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.7126186Z     return self.result_fn()
2025-12-04T10:35:20.7126605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.7126715Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.7127057Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7127062Z 
2025-12-04T10:35:20.7127173Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7127280Z Traceback (most recent call last):
2025-12-04T10:35:20.7127742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7127891Z     result = job()
2025-12-04T10:35:20.7128401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7128521Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7128997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7129106Z     self._precompile_worker()
2025-12-04T10:35:20.7129613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7129771Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7130280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7130496Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7130897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7131109Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7131506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7131800Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7131961Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7132234Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7132308Z ^
2025-12-04T10:35:20.7132708Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7132712Z 
2025-12-04T10:35:20.7132718Z 
2025-12-04T10:35:20.7133338Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7133346Z 
2025-12-04T10:35:20.7133350Z 
2025-12-04T10:35:20.7133530Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7134272Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7134278Z 
2025-12-04T10:35:20.7134507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7134702Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7134786Z frames [('total', 1)]
2025-12-04T10:35:20.7134881Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7135085Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7135596Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7135719Z graph_break []
2025-12-04T10:35:20.7135907Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7135996Z frames [('total', 1)]
2025-12-04T10:35:20.7136096Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7136288Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7136793Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7136883Z graph_break []
2025-12-04T10:35:20.7137062Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7137151Z frames [('total', 1)]
2025-12-04T10:35:20.7137252Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7137485Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7137987Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7138075Z graph_break []
2025-12-04T10:35:20.7138633Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml -
2025-12-04T10:35:20.7138787Z =========================== short test summary info ============================
2025-12-04T10:35:20.7139638Z FAILED [0.8715s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7139644Z 
2025-12-04T10:35:20.7139757Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7139861Z Traceback (most recent call last):
2025-12-04T10:35:20.7140378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7140476Z     result = job()
2025-12-04T10:35:20.7140983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7141104Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7141588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7141684Z     self._precompile_worker()
2025-12-04T10:35:20.7142204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7142356Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7142872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7143052Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7143436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7143650Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7144033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7144362Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7144526Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7144787Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7144856Z ^
2025-12-04T10:35:20.7145254Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7145262Z 
2025-12-04T10:35:20.7145268Z 
2025-12-04T10:35:20.7145879Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7145924Z 
2025-12-04T10:35:20.7145928Z 
2025-12-04T10:35:20.7146118Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7146822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7146827Z 
2025-12-04T10:35:20.7147060Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7147257Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7147510Z ============ 1 failed, 10 passed, 37 deselected, 2 rerun in 11.83s =============
2025-12-04T10:35:20.7147686Z Got exit code 1
2025-12-04T10:35:20.7147787Z Retrying single test...
2025-12-04T10:35:20.7148200Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml
2025-12-04T10:35:20.7148339Z ============================= test session starts ==============================
2025-12-04T10:35:20.7148639Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7148743Z cachedir: .pytest_cache
2025-12-04T10:35:20.7149195Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7149296Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7149396Z configfile: pytest.ini
2025-12-04T10:35:20.7149854Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7150053Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.7150715Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7150814Z Running 1 items in this shard
2025-12-04T10:35:20.7150819Z 
2025-12-04T10:35:20.7151813Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7152456Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7152934Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7153418Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7153842Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7154219Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7154769Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7155212Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7155589Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7156078Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7156456Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7156981Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7157421Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7157869Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7158337Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7158643Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7160118Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7160590Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7161479Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7162061Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7162827Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7163418Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7164277Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7164943Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7165472Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7166165Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7166483Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7167293Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7167413Z ('RERUN', {'yellow': True}) [2.2361s] [100%]
2025-12-04T10:35:20.7168385Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7169025Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7169561Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7170040Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7170469Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7170845Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7171412Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7171844Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7172223Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7172711Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7173082Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7173568Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7174043Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7174504Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7174979Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7175288Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7176729Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7177190Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7178088Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7178662Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7179474Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7180060Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7180815Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7181522Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7182048Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7182688Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7182995Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7183802Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7183916Z ('RERUN', {'yellow': True}) [0.5971s] [100%]
2025-12-04T10:35:20.7184890Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7185535Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7186051Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7186579Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7187008Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7187374Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7187884Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7188316Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7188695Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7189182Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7189553Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7190034Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7190507Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7190965Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7191432Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7191734Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7193160Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7193662Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7194554Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7195094Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7195925Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7196529Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7197290Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7197947Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7198505Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7199149Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7199458Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7200224Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7200306Z FAILED [0.5922s] [100%]
2025-12-04T10:35:20.7200311Z 
2025-12-04T10:35:20.7200432Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7200705Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7200806Z Traceback (most recent call last):
2025-12-04T10:35:20.7201152Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7201271Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7201684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7201941Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7202376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7202540Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7202972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7203092Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7203549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7203863Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7204310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7204430Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7204836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7204939Z     return self._compile_to_module()
2025-12-04T10:35:20.7205347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7205479Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7205967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7206073Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7206498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7206690Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7207189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7207299Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7207896Z   File "/tmp/tmp5ypzm6cg/wx/cwxpqc56k7bujjofl7t3w4pan3irgnswi2rtqz5sc6zd5obkzjny.py", line 50, in <module>
2025-12-04T10:35:20.7208298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7208393Z     kernel.precompile(
2025-12-04T10:35:20.7208948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7209052Z     self._precompile_worker()
2025-12-04T10:35:20.7209556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7209707Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7210215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7210378Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7210762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7210965Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7211340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7211632Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7211822Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7212084Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7212152Z ^
2025-12-04T10:35:20.7212628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7212634Z 
2025-12-04T10:35:20.7213243Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7213248Z 
2025-12-04T10:35:20.7213252Z 
2025-12-04T10:35:20.7213431Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7214120Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7214184Z 
2025-12-04T10:35:20.7214411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7214589Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7214672Z frames [('total', 1)]
2025-12-04T10:35:20.7214765Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7215177Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7215363Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7215440Z graph_break []
2025-12-04T10:35:20.7215713Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7215810Z Traceback (most recent call last):
2025-12-04T10:35:20.7216215Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7216338Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7216748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7216958Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7217401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7217560Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7217995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7218117Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7218608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7218887Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7219369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7219494Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7219897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7220001Z     return self._compile_to_module()
2025-12-04T10:35:20.7220413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7220545Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7220983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7221093Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7221516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7221715Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7222210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7222315Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7222797Z   File "/tmp/tmpgn8idbvh/4m/c4mi2hok7cfuhktrq6d33hzuiuewjdvalubwy5eqqbafwvdo2jxz.py", line 50, in <module>
2025-12-04T10:35:20.7223191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7223289Z     kernel.precompile(
2025-12-04T10:35:20.7223758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7223858Z     self._precompile_worker()
2025-12-04T10:35:20.7224366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7224642Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7225145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7225315Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7225691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7225899Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7226275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7226606Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7226801Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7227059Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7227129Z ^
2025-12-04T10:35:20.7227515Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7227520Z 
2025-12-04T10:35:20.7228124Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7228133Z 
2025-12-04T10:35:20.7228137Z 
2025-12-04T10:35:20.7228315Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7228999Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7229048Z 
2025-12-04T10:35:20.7229283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7229473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7229563Z frames [('total', 1)]
2025-12-04T10:35:20.7229654Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7230055Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7230250Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7230331Z graph_break []
2025-12-04T10:35:20.7230505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7230589Z frames [('total', 1)]
2025-12-04T10:35:20.7230677Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7230857Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7231265Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7231347Z graph_break []
2025-12-04T10:35:20.7231473Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7231744Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7231844Z Traceback (most recent call last):
2025-12-04T10:35:20.7232239Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7232361Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7232774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7232981Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7233417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7233581Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7234058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7234173Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7234630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7234902Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7235349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7235468Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7235871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7236016Z     return self._compile_to_module()
2025-12-04T10:35:20.7236428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7236572Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7237008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7237114Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7237543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7237735Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7238230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7238337Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7238808Z   File "/tmp/tmplz2z23p5/rp/crpdk6ftmt6tdgl75i7yffvgnapth7536doixdmeu3ekc7d3fex3.py", line 50, in <module>
2025-12-04T10:35:20.7239214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7239302Z     kernel.precompile(
2025-12-04T10:35:20.7239773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7239877Z     self._precompile_worker()
2025-12-04T10:35:20.7240382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7240532Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7241034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7241200Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7241579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7241784Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7242164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7242495Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7242686Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7242944Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7243012Z ^
2025-12-04T10:35:20.7243398Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7243409Z 
2025-12-04T10:35:20.7244019Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7244065Z 
2025-12-04T10:35:20.7244068Z 
2025-12-04T10:35:20.7244253Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7244934Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7244940Z 
2025-12-04T10:35:20.7245163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7245343Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7245423Z frames [('total', 1)]
2025-12-04T10:35:20.7245516Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7245922Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7246149Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7246227Z graph_break []
2025-12-04T10:35:20.7246409Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7246493Z frames [('total', 1)]
2025-12-04T10:35:20.7246589Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7246773Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7247167Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7247247Z graph_break []
2025-12-04T10:35:20.7247421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7247498Z frames [('total', 1)]
2025-12-04T10:35:20.7247592Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7247772Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7248237Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7248319Z graph_break []
2025-12-04T10:35:20.7248876Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml -
2025-12-04T10:35:20.7249024Z =========================== short test summary info ============================
2025-12-04T10:35:20.7249695Z FAILED [0.5922s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7249955Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7250028Z ^
2025-12-04T10:35:20.7250415Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7250422Z 
2025-12-04T10:35:20.7251035Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7251041Z 
2025-12-04T10:35:20.7251045Z 
2025-12-04T10:35:20.7251220Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7251948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7251953Z 
2025-12-04T10:35:20.7252175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7252322Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7252493Z ================== 1 failed, 187 deselected, 2 rerun in 3.46s ==================
2025-12-04T10:35:20.7252571Z Got exit code 1
2025-12-04T10:35:20.7252659Z Retrying single test...
2025-12-04T10:35:20.7253070Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml
2025-12-04T10:35:20.7253246Z ============================= test session starts ==============================
2025-12-04T10:35:20.7253538Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7253628Z cachedir: .pytest_cache
2025-12-04T10:35:20.7254079Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7254188Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7254275Z configfile: pytest.ini
2025-12-04T10:35:20.7254733Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7254918Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.7255574Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7255670Z Running 1 items in this shard
2025-12-04T10:35:20.7255675Z 
2025-12-04T10:35:20.7256701Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7257350Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7257813Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7258330Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7258758Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7259224Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7259734Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7260168Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7260543Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7261035Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7261410Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7261896Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7262327Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7262814Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7263283Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7263586Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7265022Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7265527Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7266471Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7267007Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7267816Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7268397Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7269148Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7269809Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7270370Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7271013Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7271323Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7272092Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7272197Z ('RERUN', {'yellow': True}) [2.2297s] [100%]
2025-12-04T10:35:20.7273170Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7273818Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7274282Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7274801Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7275222Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7275661Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7276223Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7276656Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7277079Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7277560Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7277934Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7278415Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7278843Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7279340Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7279806Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7280115Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7281541Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7282036Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7282940Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7283484Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7284247Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7284822Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7285578Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7286290Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7286851Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7287486Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7287794Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7288571Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7288719Z ('RERUN', {'yellow': True}) [0.5946s] [100%]
2025-12-04T10:35:20.7289698Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7290336Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7290801Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7291284Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7291769Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7292141Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7292650Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7293086Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7293462Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7293941Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7294361Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7294844Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7295280Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7295728Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7296193Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7296501Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7297930Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7298438Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7299378Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7299911Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7300671Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7301288Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7302049Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7302704Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7303270Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7303902Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7304215Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7304982Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7305061Z FAILED [0.5923s] [100%]
2025-12-04T10:35:20.7305066Z 
2025-12-04T10:35:20.7305184Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7305455Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7305604Z Traceback (most recent call last):
2025-12-04T10:35:20.7305948Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7306070Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7306485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7306692Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7307128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7307291Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7307721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7307986Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7308447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7308718Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7309163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7309283Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7309760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7309859Z     return self._compile_to_module()
2025-12-04T10:35:20.7310267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7310402Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7310839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7310950Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7311366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7311617Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7312115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7312220Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7312639Z   File "/tmp/tmpahwq0k6_/kr/ckrqtec7h2xh5cyp43uhor2apoc2btydbltiilfqv2mgcp3uc3ou.py", line 50, in <module>
2025-12-04T10:35:20.7313031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7313117Z     kernel.precompile(
2025-12-04T10:35:20.7313586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7313739Z     self._precompile_worker()
2025-12-04T10:35:20.7314245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7314393Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7314897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7315061Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7315439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7315654Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7316062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7316406Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7316603Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7316860Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7316927Z ^
2025-12-04T10:35:20.7317322Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7317326Z 
2025-12-04T10:35:20.7317935Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7317940Z 
2025-12-04T10:35:20.7317944Z 
2025-12-04T10:35:20.7318125Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7318810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7318820Z 
2025-12-04T10:35:20.7319042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7319221Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7319304Z frames [('total', 1)]
2025-12-04T10:35:20.7319395Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7319840Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7320026Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7320106Z graph_break []
2025-12-04T10:35:20.7323951Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7324078Z Traceback (most recent call last):
2025-12-04T10:35:20.7324441Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7324567Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7324988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7325272Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7325709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7325883Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7326315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7326438Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7326898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7327224Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7327675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7327796Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7328206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7328313Z     return self._compile_to_module()
2025-12-04T10:35:20.7328726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7328865Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7329308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7329419Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7329895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7330099Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7330595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7330708Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7331146Z   File "/tmp/tmp8eglbmbs/bf/cbfe35rfnvfutp2nzixkzreaq26dk3k4gskjg556ft4wznm5elmy.py", line 50, in <module>
2025-12-04T10:35:20.7331552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7331646Z     kernel.precompile(
2025-12-04T10:35:20.7332123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7332232Z     self._precompile_worker()
2025-12-04T10:35:20.7332739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7332891Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7333407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7333576Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7334008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7334217Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7334589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7334875Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7335075Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7335337Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7335482Z ^
2025-12-04T10:35:20.7335922Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7335928Z 
2025-12-04T10:35:20.7336552Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7336557Z 
2025-12-04T10:35:20.7336561Z 
2025-12-04T10:35:20.7336744Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7337432Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7337479Z 
2025-12-04T10:35:20.7337706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7337888Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7337983Z frames [('total', 1)]
2025-12-04T10:35:20.7338078Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7338485Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7338676Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7338761Z graph_break []
2025-12-04T10:35:20.7338950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7339092Z frames [('total', 1)]
2025-12-04T10:35:20.7339188Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7339378Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7339819Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7339909Z graph_break []
2025-12-04T10:35:20.7340034Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7340305Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7340420Z Traceback (most recent call last):
2025-12-04T10:35:20.7340763Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7340889Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7341304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7341514Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7341953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7342121Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7342552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7342678Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7343129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7343451Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7343898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7344020Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7344433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7344532Z     return self._compile_to_module()
2025-12-04T10:35:20.7344948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7345133Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7345573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7345702Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7346162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7346354Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7346862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7346965Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7347452Z   File "/tmp/tmp1djeka8t/tq/ctqdg7vdjzxvazwb4l25rkhb26l3llguhnyebxci7dobe7fnxexh.py", line 50, in <module>
2025-12-04T10:35:20.7347843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7347936Z     kernel.precompile(
2025-12-04T10:35:20.7348409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7348511Z     self._precompile_worker()
2025-12-04T10:35:20.7349021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7349177Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7349688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7349861Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7350287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7350497Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7350876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7351162Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7351362Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7351623Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7351696Z ^
2025-12-04T10:35:20.7352096Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7352101Z 
2025-12-04T10:35:20.7352708Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7352716Z 
2025-12-04T10:35:20.7352722Z 
2025-12-04T10:35:20.7352905Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7353589Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7353593Z 
2025-12-04T10:35:20.7353862Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7354050Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7354136Z frames [('total', 1)]
2025-12-04T10:35:20.7354237Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7354636Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7354835Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7354921Z graph_break []
2025-12-04T10:35:20.7355100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7355227Z frames [('total', 1)]
2025-12-04T10:35:20.7355333Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7355518Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7355967Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7356057Z graph_break []
2025-12-04T10:35:20.7356235Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7356330Z frames [('total', 1)]
2025-12-04T10:35:20.7356424Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7356610Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7357009Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7357136Z graph_break []
2025-12-04T10:35:20.7357696Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml -
2025-12-04T10:35:20.7357852Z =========================== short test summary info ============================
2025-12-04T10:35:20.7358530Z FAILED [0.5923s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7358803Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7358880Z ^
2025-12-04T10:35:20.7359277Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7359288Z 
2025-12-04T10:35:20.7359936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7359944Z 
2025-12-04T10:35:20.7359950Z 
2025-12-04T10:35:20.7360136Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7360824Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7360829Z 
2025-12-04T10:35:20.7361055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7361210Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7361382Z ================== 1 failed, 187 deselected, 2 rerun in 3.45s ==================
2025-12-04T10:35:20.7361460Z Got exit code 1
2025-12-04T10:35:20.7361942Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7362296Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.7362708Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml
2025-12-04T10:35:20.7362842Z ============================= test session starts ==============================
2025-12-04T10:35:20.7363140Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7363289Z cachedir: .pytest_cache
2025-12-04T10:35:20.7363737Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7363841Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7363939Z configfile: pytest.ini
2025-12-04T10:35:20.7364403Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7364605Z collecting ... collected 188 items / 48 deselected / 140 selected
2025-12-04T10:35:20.7364725Z stepcurrent: skipping 48 already run items.
2025-12-04T10:35:20.7364863Z Running 140 items in this shard
2025-12-04T10:35:20.7364868Z 
2025-12-04T10:35:20.7365873Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7366514Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7366991Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7367524Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7367950Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7368323Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7368826Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7369261Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7369647Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7370174Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7370563Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7371046Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7371483Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7371930Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7372405Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7372709Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7374145Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7374672Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7375569Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7376113Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7376874Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7377499Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7378252Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7378917Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7379532Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7380178Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7380500Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7381280Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7381404Z ('RERUN', {'yellow': True}) [2.1332s] [  0%]
2025-12-04T10:35:20.7382440Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7383085Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7383565Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7384047Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7384480Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7384850Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7385381Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7385870Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7386258Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7386793Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7387176Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7387673Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7388111Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7388572Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7389095Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7389401Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7390843Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7391350Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7392259Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7392809Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7393573Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7394175Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7395526Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7396203Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7396730Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7397382Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7397695Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7398476Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7398603Z ('RERUN', {'yellow': True}) [0.6094s] [  0%]
2025-12-04T10:35:20.7399727Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7400371Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7400839Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7401331Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7401795Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7402162Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7402691Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7403123Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7403506Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7404032Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7404403Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7404893Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7405330Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7405794Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7406262Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7406575Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7408276Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7408747Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7409642Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7410182Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7410942Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7411582Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7412346Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7413006Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7413529Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7414231Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7414547Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7415322Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7415410Z FAILED [0.6123s] [  0%]
2025-12-04T10:35:20.7415415Z 
2025-12-04T10:35:20.7415537Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7415881Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7415985Z Traceback (most recent call last):
2025-12-04T10:35:20.7416336Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7416460Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7416871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7417093Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7417528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7417695Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7418127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7418249Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7418772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7419092Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7419552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7419672Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7420079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7420186Z     return self._compile_to_module()
2025-12-04T10:35:20.7420594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7420727Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7421172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7421279Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7421707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7421898Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7422441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7422555Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7422966Z   File "/tmp/tmpe_auaqvz/zz/czz57o3q7co2okbx6hidugeqxaewtskj35xsxxfv4jed6ihd3mas.py", line 50, in <module>
2025-12-04T10:35:20.7423364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7423456Z     kernel.precompile(
2025-12-04T10:35:20.7423930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7424068Z     self._precompile_worker()
2025-12-04T10:35:20.7424575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7424723Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7425234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7425402Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7425781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7425985Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7426415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7426707Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7426902Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7427161Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7427233Z ^
2025-12-04T10:35:20.7427623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7427627Z 
2025-12-04T10:35:20.7428243Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7428248Z 
2025-12-04T10:35:20.7428252Z 
2025-12-04T10:35:20.7428436Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7429188Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7429196Z 
2025-12-04T10:35:20.7429423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7429604Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7429697Z frames [('total', 1)]
2025-12-04T10:35:20.7429791Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7430198Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7430389Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7430468Z graph_break []
2025-12-04T10:35:20.7430760Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7430867Z Traceback (most recent call last):
2025-12-04T10:35:20.7431214Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7431345Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7431769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7431988Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7432469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7432631Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7433078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7433195Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7433654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7433930Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7434411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7434544Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7434954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7435050Z     return self._compile_to_module()
2025-12-04T10:35:20.7435468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7435611Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7436059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7436218Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7436640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7436843Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7437339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7437442Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7439308Z   File "/tmp/tmpgym3s0re/zq/czq2afu4t524vvkyiy5lt74i32ciwkh5tj7hotnwbhmkftpyciwg.py", line 50, in <module>
2025-12-04T10:35:20.7439704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7439803Z     kernel.precompile(
2025-12-04T10:35:20.7440273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7440418Z     self._precompile_worker()
2025-12-04T10:35:20.7440930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7441080Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7441598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7441763Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7442141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7442356Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7442726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7443018Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7443207Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7443470Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7443549Z ^
2025-12-04T10:35:20.7443935Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7443940Z 
2025-12-04T10:35:20.7444591Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7444601Z 
2025-12-04T10:35:20.7444605Z 
2025-12-04T10:35:20.7444789Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7445491Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7445498Z 
2025-12-04T10:35:20.7445738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7445997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7446088Z frames [('total', 1)]
2025-12-04T10:35:20.7446182Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7446584Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7446778Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7446857Z graph_break []
2025-12-04T10:35:20.7447033Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7447120Z frames [('total', 1)]
2025-12-04T10:35:20.7447213Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7447401Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7447845Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7447928Z graph_break []
2025-12-04T10:35:20.7448050Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7448334Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7448438Z Traceback (most recent call last):
2025-12-04T10:35:20.7448792Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7448913Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7449335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7449546Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7450031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7450200Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7450635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7450752Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7451217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7451490Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7451935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7452055Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7452464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7452567Z     return self._compile_to_module()
2025-12-04T10:35:20.7452974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7453112Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7453550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7453701Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7454121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7454313Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7454815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7454932Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7455362Z   File "/tmp/tmpcf9tmsnl/xk/cxk43d4xa2sy7vpd5g6fnl3uknwvgx5l67o5ohjyql2wtgaj3dcp.py", line 50, in <module>
2025-12-04T10:35:20.7455800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7455888Z     kernel.precompile(
2025-12-04T10:35:20.7456357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7456456Z     self._precompile_worker()
2025-12-04T10:35:20.7456963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7457114Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7457618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7457853Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7458237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7458442Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7458818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7459159Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7459348Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7459614Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7459681Z ^
2025-12-04T10:35:20.7460065Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7460072Z 
2025-12-04T10:35:20.7460729Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7460736Z 
2025-12-04T10:35:20.7460740Z 
2025-12-04T10:35:20.7460919Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7461619Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7461625Z 
2025-12-04T10:35:20.7461849Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7462026Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7462112Z frames [('total', 1)]
2025-12-04T10:35:20.7462203Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7462602Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7462789Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7462869Z graph_break []
2025-12-04T10:35:20.7463049Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7463130Z frames [('total', 1)]
2025-12-04T10:35:20.7463221Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7463404Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7463842Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7463927Z graph_break []
2025-12-04T10:35:20.7464102Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7464183Z frames [('total', 1)]
2025-12-04T10:35:20.7464280Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7464462Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7464854Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7464981Z graph_break []
2025-12-04T10:35:20.7465535Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml -
2025-12-04T10:35:20.7465695Z =========================== short test summary info ============================
2025-12-04T10:35:20.7466414Z FAILED [0.6123s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7466670Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7466742Z ^
2025-12-04T10:35:20.7467128Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7467174Z 
2025-12-04T10:35:20.7467791Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7467799Z 
2025-12-04T10:35:20.7467803Z 
2025-12-04T10:35:20.7467979Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7468675Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7468679Z 
2025-12-04T10:35:20.7468901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7469047Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7469213Z ================== 1 failed, 48 deselected, 2 rerun in 3.39s ===================
2025-12-04T10:35:20.7469293Z Got exit code 1
2025-12-04T10:35:20.7469386Z Retrying single test...
2025-12-04T10:35:20.7469830Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml
2025-12-04T10:35:20.7469963Z ============================= test session starts ==============================
2025-12-04T10:35:20.7470256Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7470342Z cachedir: .pytest_cache
2025-12-04T10:35:20.7470786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7470888Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7470973Z configfile: pytest.ini
2025-12-04T10:35:20.7471430Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7471616Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.7472238Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7472335Z Running 1 items in this shard
2025-12-04T10:35:20.7472339Z 
2025-12-04T10:35:20.7473335Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7474019Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7474487Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7475043Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7475532Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7476001Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7476522Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7476951Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7477329Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7477814Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7478227Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7478713Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7479142Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7479586Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7480053Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7480354Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7481828Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7482295Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7483191Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7483728Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7484488Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7485072Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7485869Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7486530Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7487052Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7487690Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7488039Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7488914Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7489021Z ('RERUN', {'yellow': True}) [2.1104s] [100%]
2025-12-04T10:35:20.7490013Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7490692Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7491155Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7491633Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7492053Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7492415Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7492961Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7493399Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7493779Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7494262Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7494630Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7495111Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7495543Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7495994Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7496456Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7496764Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7498234Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7498697Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7499635Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7500239Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7501003Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7501584Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7502382Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7503039Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7503563Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7504196Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7504505Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7505312Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7505421Z ('RERUN', {'yellow': True}) [0.6058s] [100%]
2025-12-04T10:35:20.7506415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7507053Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7507515Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7508150Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7508572Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7508937Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7509523Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7509957Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7510331Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7510812Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7511186Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7511741Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7512173Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7512619Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7513077Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7513378Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7514861Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7515323Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7516264Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7516860Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7517617Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7518198Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7518952Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7519606Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7520130Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7520770Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7521079Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7521881Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7521963Z FAILED [0.6096s] [100%]
2025-12-04T10:35:20.7521968Z 
2025-12-04T10:35:20.7522089Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7522370Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7522476Z Traceback (most recent call last):
2025-12-04T10:35:20.7522815Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7522976Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7523390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7523597Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7524039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7524197Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7524626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7524786Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7525238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7525509Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7526008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7526124Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7526531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7526630Z     return self._compile_to_module()
2025-12-04T10:35:20.7527036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7527172Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7527667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7527776Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7528196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7528390Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7528894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7528996Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7529416Z   File "/tmp/tmp7eie7i9q/g5/cg5dya7d65k3y2oopzyqnsq3d47mcd65o7hqrwxih42wq3v3lpzo.py", line 50, in <module>
2025-12-04T10:35:20.7529808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7529900Z     kernel.precompile(
2025-12-04T10:35:20.7530376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7530471Z     self._precompile_worker()
2025-12-04T10:35:20.7530979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7531131Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7531679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7531847Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7532223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7532424Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7532808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7533088Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7533318Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7533577Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7533643Z ^
2025-12-04T10:35:20.7534038Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7534043Z 
2025-12-04T10:35:20.7534648Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7534654Z 
2025-12-04T10:35:20.7534657Z 
2025-12-04T10:35:20.7534836Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7535580Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7535587Z 
2025-12-04T10:35:20.7535833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7536042Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7536130Z frames [('total', 1)]
2025-12-04T10:35:20.7536226Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7536636Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7536823Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7536905Z graph_break []
2025-12-04T10:35:20.7537185Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7537281Z Traceback (most recent call last):
2025-12-04T10:35:20.7537668Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7537787Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7538203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7538413Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7538846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7539007Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7539503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7539619Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7540073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7540345Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7540793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7540911Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7541314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7541494Z     return self._compile_to_module()
2025-12-04T10:35:20.7541906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7542039Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7542476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7542582Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7543004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7543239Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7543736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7543841Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7544276Z   File "/tmp/tmpqnps4uev/ku/ckueenl5uqa3jupn6s6lf27hx5uc54auma3vrxmztpdq4pdwmwxg.py", line 50, in <module>
2025-12-04T10:35:20.7544673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7544763Z     kernel.precompile(
2025-12-04T10:35:20.7545232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7545372Z     self._precompile_worker()
2025-12-04T10:35:20.7545876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7546023Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7546532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7546696Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7547077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7547282Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7547651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7547976Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7548173Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7548436Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7548503Z ^
2025-12-04T10:35:20.7548888Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7548893Z 
2025-12-04T10:35:20.7549503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7549508Z 
2025-12-04T10:35:20.7549512Z 
2025-12-04T10:35:20.7549690Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7550386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7550396Z 
2025-12-04T10:35:20.7550618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7550796Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7550883Z frames [('total', 1)]
2025-12-04T10:35:20.7550973Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7551371Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7551615Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7551699Z graph_break []
2025-12-04T10:35:20.7551877Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7551956Z frames [('total', 1)]
2025-12-04T10:35:20.7552045Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7552234Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7552634Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7552757Z graph_break []
2025-12-04T10:35:20.7552876Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7553157Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7553263Z Traceback (most recent call last):
2025-12-04T10:35:20.7553604Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7553723Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7554135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7554347Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7554799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7555000Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7555436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7555559Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7556010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7556288Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7556731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7556848Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7557265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7557404Z     return self._compile_to_module()
2025-12-04T10:35:20.7557817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7557953Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7558388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7558494Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7558915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7559109Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7559609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7559714Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7560155Z   File "/tmp/tmpkc0cbrvl/e5/ce5bv2ptgvsdftkb3dl6zbv5oultpep6ldfiedv3o5xcswoxaan4.py", line 50, in <module>
2025-12-04T10:35:20.7560549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7560635Z     kernel.precompile(
2025-12-04T10:35:20.7561108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7561240Z     self._precompile_worker()
2025-12-04T10:35:20.7561750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7561899Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7562402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7562574Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7562949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7563191Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7563566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7563856Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7564048Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7564303Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7564372Z ^
2025-12-04T10:35:20.7564762Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7564766Z 
2025-12-04T10:35:20.7565417Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7565425Z 
2025-12-04T10:35:20.7565429Z 
2025-12-04T10:35:20.7565610Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7566301Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7566308Z 
2025-12-04T10:35:20.7566528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7566705Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7566783Z frames [('total', 1)]
2025-12-04T10:35:20.7566877Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7567275Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7567590Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7567675Z graph_break []
2025-12-04T10:35:20.7567851Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7567936Z frames [('total', 1)]
2025-12-04T10:35:20.7568029Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7568210Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7568609Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7568689Z graph_break []
2025-12-04T10:35:20.7568861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7568947Z frames [('total', 1)]
2025-12-04T10:35:20.7569038Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7569217Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7569615Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7569695Z graph_break []
2025-12-04T10:35:20.7570248Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml -
2025-12-04T10:35:20.7570388Z =========================== short test summary info ============================
2025-12-04T10:35:20.7571114Z FAILED [0.6096s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7571402Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7571474Z ^
2025-12-04T10:35:20.7571884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7571895Z 
2025-12-04T10:35:20.7572545Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7572592Z 
2025-12-04T10:35:20.7572596Z 
2025-12-04T10:35:20.7572772Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7573472Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7573477Z 
2025-12-04T10:35:20.7573696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7573843Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7574008Z ================== 1 failed, 187 deselected, 2 rerun in 3.36s ==================
2025-12-04T10:35:20.7574083Z Got exit code 1
2025-12-04T10:35:20.7574218Z Retrying single test...
2025-12-04T10:35:20.7574616Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml
2025-12-04T10:35:20.7574749Z ============================= test session starts ==============================
2025-12-04T10:35:20.7575050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7575137Z cachedir: .pytest_cache
2025-12-04T10:35:20.7575592Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7575690Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7575775Z configfile: pytest.ini
2025-12-04T10:35:20.7576236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7576421Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.7577117Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7577211Z Running 1 items in this shard
2025-12-04T10:35:20.7577215Z 
2025-12-04T10:35:20.7578210Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7578854Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7583488Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7584098Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7584525Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7584897Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7585418Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7585930Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7586322Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7586808Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7587193Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7587717Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7588150Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7588608Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7589077Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7589398Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7590875Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7591344Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7592235Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7592778Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7593592Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7594174Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7594932Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7595589Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7596114Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7596752Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7597063Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7597873Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7597982Z ('RERUN', {'yellow': True}) [2.1298s] [100%]
2025-12-04T10:35:20.7598991Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7599631Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7600151Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7600630Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7601053Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7601434Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7601938Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7602421Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7602807Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7603291Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7603671Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7604155Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7604597Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7605084Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7605553Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7605908Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7607349Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7608008Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7608900Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7609439Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7610278Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7610863Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7611620Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7612332Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7612863Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7613501Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7613814Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7614629Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7614741Z ('RERUN', {'yellow': True}) [0.6075s] [100%]
2025-12-04T10:35:20.7615745Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7616435Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7616902Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7617435Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7617859Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7618229Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7618734Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7619243Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7619621Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7620109Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7620484Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7620971Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7621406Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7621896Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7622375Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7622678Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7624117Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7624622Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7625510Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7626052Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7626874Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7627462Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7628213Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7628872Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7629433Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7630072Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7630382Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7631150Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7631242Z FAILED [0.6086s] [100%]
2025-12-04T10:35:20.7631247Z 
2025-12-04T10:35:20.7631364Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7631655Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7631767Z Traceback (most recent call last):
2025-12-04T10:35:20.7632112Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7632249Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7632659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7632871Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7633358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7633522Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7633965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7634089Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7634549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7634874Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7635320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7635455Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7635915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7636018Z     return self._compile_to_module()
2025-12-04T10:35:20.7636434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7636568Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7637009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7637165Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7637583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7637783Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7638278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7638384Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7638846Z   File "/tmp/tmpw3kbw8dv/jd/cjdsncjptmogiwtesbxavcdpsxya2pmdltuxfeayzbyumnabgc3f.py", line 50, in <module>
2025-12-04T10:35:20.7639243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7639340Z     kernel.precompile(
2025-12-04T10:35:20.7639854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7639953Z     self._precompile_worker()
2025-12-04T10:35:20.7640468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7640619Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7641128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7641298Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7641674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7641891Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7642268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7642554Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7642759Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7643024Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7643093Z ^
2025-12-04T10:35:20.7643506Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7643555Z 
2025-12-04T10:35:20.7644180Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7644185Z 
2025-12-04T10:35:20.7644189Z 
2025-12-04T10:35:20.7644385Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7645092Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7645143Z 
2025-12-04T10:35:20.7645378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7645566Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7645652Z frames [('total', 1)]
2025-12-04T10:35:20.7645755Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7646166Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7646357Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7646436Z graph_break []
2025-12-04T10:35:20.7646730Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7646839Z Traceback (most recent call last):
2025-12-04T10:35:20.7647228Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7647352Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7647783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7647995Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7648440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7648605Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7649038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7649162Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7649621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7649937Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7650387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7650515Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7650939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7651041Z     return self._compile_to_module()
2025-12-04T10:35:20.7651457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7651605Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7652043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7652163Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7652584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7652786Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7653299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7653407Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7653895Z   File "/tmp/tmpuuh1g8t8/xy/cxyyllic5xoci6rvylaviilarlo2ha3lnt6wcv2cfhtrict5eybz.py", line 50, in <module>
2025-12-04T10:35:20.7654300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7654391Z     kernel.precompile(
2025-12-04T10:35:20.7654876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7654979Z     self._precompile_worker()
2025-12-04T10:35:20.7655488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7655694Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7656205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7656386Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7656772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7656977Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7657362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7657647Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7657890Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7658160Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7658233Z ^
2025-12-04T10:35:20.7658628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7658633Z 
2025-12-04T10:35:20.7659290Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7659295Z 
2025-12-04T10:35:20.7659299Z 
2025-12-04T10:35:20.7659493Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7660197Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7660205Z 
2025-12-04T10:35:20.7660478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7660671Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7660758Z frames [('total', 1)]
2025-12-04T10:35:20.7660862Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7661265Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7661453Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7661541Z graph_break []
2025-12-04T10:35:20.7661717Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7661798Z frames [('total', 1)]
2025-12-04T10:35:20.7661900Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7662081Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7662479Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7662561Z graph_break []
2025-12-04T10:35:20.7662681Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7662982Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.7663086Z Traceback (most recent call last):
2025-12-04T10:35:20.7663475Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7663604Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7664021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7664235Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7664673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7664846Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7665282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7665473Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7665976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7666255Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7666699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7666824Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7667231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7667375Z     return self._compile_to_module()
2025-12-04T10:35:20.7667787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7667924Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7668376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7668487Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7668904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7669109Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7669604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7669710Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7670189Z   File "/tmp/tmpflu603bh/bq/cbqxp44iicb6iufb2ymuzdg3f5fvc2ax7r26kofnfo5i2gsiaskj.py", line 50, in <module>
2025-12-04T10:35:20.7670584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7670679Z     kernel.precompile(
2025-12-04T10:35:20.7671150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7671243Z     self._precompile_worker()
2025-12-04T10:35:20.7671757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7671906Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7672424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7672590Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7672970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7673177Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7673551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7673832Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7674070Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7674329Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7674406Z ^
2025-12-04T10:35:20.7674795Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7674800Z 
2025-12-04T10:35:20.7675407Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7675466Z 
2025-12-04T10:35:20.7675469Z 
2025-12-04T10:35:20.7675653Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7676345Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7676350Z 
2025-12-04T10:35:20.7676584Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7676760Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7676850Z frames [('total', 1)]
2025-12-04T10:35:20.7676947Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7677344Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7677586Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7677666Z graph_break []
2025-12-04T10:35:20.7677842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7677933Z frames [('total', 1)]
2025-12-04T10:35:20.7678023Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7678205Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7678609Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7678687Z graph_break []
2025-12-04T10:35:20.7678866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7678948Z frames [('total', 1)]
2025-12-04T10:35:20.7679039Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7679223Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7679657Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7679735Z graph_break []
2025-12-04T10:35:20.7680295Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml -
2025-12-04T10:35:20.7680437Z =========================== short test summary info ============================
2025-12-04T10:35:20.7681125Z FAILED [0.6086s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7681390Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7681455Z ^
2025-12-04T10:35:20.7681845Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7681850Z 
2025-12-04T10:35:20.7682459Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7682465Z 
2025-12-04T10:35:20.7682469Z 
2025-12-04T10:35:20.7682653Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7683340Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7683391Z 
2025-12-04T10:35:20.7683620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7683781Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7683946Z ================== 1 failed, 187 deselected, 2 rerun in 3.38s ==================
2025-12-04T10:35:20.7684036Z Got exit code 1
2025-12-04T10:35:20.7684517Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.7684879Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.7685335Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml
2025-12-04T10:35:20.7685478Z ============================= test session starts ==============================
2025-12-04T10:35:20.7685793Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7685912Z cachedir: .pytest_cache
2025-12-04T10:35:20.7686389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7686506Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7686600Z configfile: pytest.ini
2025-12-04T10:35:20.7687069Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7687321Z collecting ... collected 188 items / 49 deselected / 139 selected
2025-12-04T10:35:20.7687444Z stepcurrent: skipping 49 already run items.
2025-12-04T10:35:20.7687549Z Running 139 items in this shard
2025-12-04T10:35:20.7687553Z 
2025-12-04T10:35:20.7688002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [2.3742s] [  0%]
2025-12-04T10:35:20.7688448Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6928s] [  1%]
2025-12-04T10:35:20.7689432Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7690124Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7690599Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7691078Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7691521Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7691899Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7692411Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7692859Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7693238Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7693740Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7694158Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7694638Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7695077Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7695536Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7696063Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7696419Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7697865Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7698338Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7699344Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7699890Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7700653Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7701242Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7702037Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7702725Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7703248Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7703890Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7704209Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7704969Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7705090Z ('RERUN', {'yellow': True}) [0.4801s] [  2%]
2025-12-04T10:35:20.7705600Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [0.8779s] [  2%]
2025-12-04T10:35:20.7706105Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.8616s] [  2%]
2025-12-04T10:35:20.7706110Z 
2025-12-04T10:35:20.7706301Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7706571Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7706686Z Traceback (most recent call last):
2025-12-04T10:35:20.7707037Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7707160Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7707590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7707974Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7708610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7708774Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7709216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7709343Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7709807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7710083Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7710609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7710730Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7711150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7711258Z     return self._compile_to_module()
2025-12-04T10:35:20.7711673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7711835Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7712271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7712391Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7712810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7713076Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7713585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7713698Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7714129Z   File "/tmp/tmp_noq3ytb/gu/cgunxsafvni65swzh7z7pgrdxcoe3jhwdf6yibigvusp32vv3tir.py", line 50, in <module>
2025-12-04T10:35:20.7714522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7714613Z     kernel.precompile(
2025-12-04T10:35:20.7715094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7715193Z     self._precompile_worker()
2025-12-04T10:35:20.7715726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7715913Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7716416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7716597Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7716975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7717240Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7717620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7717904Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7718113Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7718379Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7718451Z ^
2025-12-04T10:35:20.7718850Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7718897Z 
2025-12-04T10:35:20.7719510Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7719515Z 
2025-12-04T10:35:20.7719519Z 
2025-12-04T10:35:20.7719707Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7720394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7720399Z 
2025-12-04T10:35:20.7720625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7720867Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7720948Z frames [('total', 1)]
2025-12-04T10:35:20.7721053Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7721245Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7721641Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7721726Z graph_break []
2025-12-04T10:35:20.7721995Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7722091Z Traceback (most recent call last):
2025-12-04T10:35:20.7722435Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7722551Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7722966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7723215Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7723647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7723813Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7724246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7724373Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7724821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7725087Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7725536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7725660Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7726070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7726180Z     return self._compile_to_module()
2025-12-04T10:35:20.7726588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7726731Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7727209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7727314Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7727740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7727929Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7728431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7728534Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7729020Z   File "/tmp/tmpm4haybre/cg/ccglg3kkkbgqtba77iuoorkipbmgpq6memshhpldgay6cxqq43hp.py", line 84, in <module>
2025-12-04T10:35:20.7729407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.7729498Z     self._wait_futures(scope)
2025-12-04T10:35:20.7729917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.7730011Z     kernel = result.result()
2025-12-04T10:35:20.7730383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.7730480Z     return self.result_fn()
2025-12-04T10:35:20.7730886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.7731033Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.7731364Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7731369Z 
2025-12-04T10:35:20.7731479Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7731583Z Traceback (most recent call last):
2025-12-04T10:35:20.7732043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7732124Z     result = job()
2025-12-04T10:35:20.7732624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7732744Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7733213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7733357Z     self._precompile_worker()
2025-12-04T10:35:20.7733861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7734016Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7734520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7734687Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7735065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7735266Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7735643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7735932Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7736085Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7736346Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7736416Z ^
2025-12-04T10:35:20.7736804Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7736809Z 
2025-12-04T10:35:20.7736812Z 
2025-12-04T10:35:20.7737466Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7737471Z 
2025-12-04T10:35:20.7737475Z 
2025-12-04T10:35:20.7737665Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7738349Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7738357Z 
2025-12-04T10:35:20.7738579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7738805Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7738886Z frames [('total', 1)]
2025-12-04T10:35:20.7738977Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7739255Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7739655Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7739729Z graph_break []
2025-12-04T10:35:20.7739906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7739985Z frames [('total', 1)]
2025-12-04T10:35:20.7740076Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7740257Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7740887Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7740974Z graph_break []
2025-12-04T10:35:20.7741091Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7741355Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7741456Z Traceback (most recent call last):
2025-12-04T10:35:20.7741798Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7741926Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7742335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7742548Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7743058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7743219Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7743651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7743771Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7744223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7744501Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7744935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7745053Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7745467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7745566Z     return self._compile_to_module()
2025-12-04T10:35:20.7745975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7746109Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7746544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7746690Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7747108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7747297Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7747793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7747899Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7748335Z   File "/tmp/tmpq28ktwbd/ej/cejyztkg2iny6hlwuthq35ulchpdk7nttbfwfkq7hmvrbhmv4nrp.py", line 84, in <module>
2025-12-04T10:35:20.7748760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.7748852Z     self._wait_futures(scope)
2025-12-04T10:35:20.7749273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.7749367Z     kernel = result.result()
2025-12-04T10:35:20.7749740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.7749829Z     return self.result_fn()
2025-12-04T10:35:20.7750230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.7750336Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.7750707Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7750711Z 
2025-12-04T10:35:20.7750839Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7750978Z Traceback (most recent call last):
2025-12-04T10:35:20.7751467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7751546Z     result = job()
2025-12-04T10:35:20.7752047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7752167Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7752638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7752732Z     self._precompile_worker()
2025-12-04T10:35:20.7753293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7753441Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7753944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7754108Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7754484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7754684Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7755056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7755335Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7755498Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7755755Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7755823Z ^
2025-12-04T10:35:20.7756212Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7756217Z 
2025-12-04T10:35:20.7756221Z 
2025-12-04T10:35:20.7756827Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7756879Z 
2025-12-04T10:35:20.7756884Z 
2025-12-04T10:35:20.7757068Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7757745Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7757750Z 
2025-12-04T10:35:20.7757975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7758153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7758277Z frames [('total', 1)]
2025-12-04T10:35:20.7758375Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7758565Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7758960Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7759044Z graph_break []
2025-12-04T10:35:20.7759219Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7759313Z frames [('total', 1)]
2025-12-04T10:35:20.7759401Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7759580Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7760078Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7760201Z graph_break []
2025-12-04T10:35:20.7760373Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7760459Z frames [('total', 1)]
2025-12-04T10:35:20.7760547Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7760727Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7761225Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.7761302Z graph_break []
2025-12-04T10:35:20.7761855Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml -
2025-12-04T10:35:20.7761993Z =========================== short test summary info ============================
2025-12-04T10:35:20.7762829Z FAILED [0.8616s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.7762846Z 
2025-12-04T10:35:20.7762949Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7763054Z Traceback (most recent call last):
2025-12-04T10:35:20.7763525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.7763598Z     result = job()
2025-12-04T10:35:20.7764101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.7764218Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.7764686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.7764776Z     self._precompile_worker()
2025-12-04T10:35:20.7765287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7765435Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7765942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7766104Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7766624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7766831Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7767201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7767481Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7767634Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7767889Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7768000Z ^
2025-12-04T10:35:20.7768383Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7768388Z 
2025-12-04T10:35:20.7768392Z 
2025-12-04T10:35:20.7768999Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7769004Z 
2025-12-04T10:35:20.7769007Z 
2025-12-04T10:35:20.7769184Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7769861Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7769909Z 
2025-12-04T10:35:20.7770132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7770279Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7770466Z ============= 1 failed, 2 passed, 49 deselected, 2 rerun in 5.32s ==============
2025-12-04T10:35:20.7770543Z Got exit code 1
2025-12-04T10:35:20.7770627Z Retrying single test...
2025-12-04T10:35:20.7771025Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml
2025-12-04T10:35:20.7771160Z ============================= test session starts ==============================
2025-12-04T10:35:20.7771451Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7771536Z cachedir: .pytest_cache
2025-12-04T10:35:20.7771983Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7772089Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7772212Z configfile: pytest.ini
2025-12-04T10:35:20.7772678Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7772865Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.7773467Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7773560Z Running 1 items in this shard
2025-12-04T10:35:20.7773565Z 
2025-12-04T10:35:20.7774530Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7775176Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7775639Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7776118Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7776585Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7776951Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7777455Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7777892Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7778269Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7778795Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7779257Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7779742Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7780171Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7780613Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7781156Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7781464Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7782896Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7783349Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7784281Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7784816Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7785581Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7786157Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7786907Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7787569Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7788089Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7788767Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7789075Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7789840Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7789948Z ('RERUN', {'yellow': True}) [2.2324s] [100%]
2025-12-04T10:35:20.7790912Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7791590Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7792049Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7792526Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7792989Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7793351Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7793864Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7794295Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7794673Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7795151Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7795560Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7796093Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7796524Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7796970Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7797435Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7797742Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7799163Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7799633Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7800560Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7801093Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7801858Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7802476Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7803230Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7803886Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7804406Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7805089Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7805398Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7806164Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7806271Z ('RERUN', {'yellow': True}) [0.5965s] [100%]
2025-12-04T10:35:20.7807239Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7808272Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7808748Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7809234Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7809652Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7810016Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7810518Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7810954Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7811329Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7811812Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7812250Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7812730Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7813165Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7813611Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7814074Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7814438Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7815866Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7816324Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7817267Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7817805Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7818559Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7819193Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7819992Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7820651Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7821175Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7821810Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7822121Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7822879Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7822960Z FAILED [0.5903s] [100%]
2025-12-04T10:35:20.7822975Z 
2025-12-04T10:35:20.7823098Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7823363Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7823465Z Traceback (most recent call last):
2025-12-04T10:35:20.7823846Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7823969Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7824383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7824595Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7825033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7825201Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7825647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7825874Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7826334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7826603Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7827043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7827165Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7827570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7827711Z     return self._compile_to_module()
2025-12-04T10:35:20.7828117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7828260Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7828698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7828805Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7829223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7829414Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7829915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7830014Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7830463Z   File "/tmp/tmp1a_jzp_w/6d/c6dxsz36vbtu6jr4bsr4pjtozpg44wbnirs25mmysage5t5mvrmk.py", line 50, in <module>
2025-12-04T10:35:20.7830857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7830947Z     kernel.precompile(
2025-12-04T10:35:20.7831416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7831507Z     self._precompile_worker()
2025-12-04T10:35:20.7832018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7832167Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7832668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7832835Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7833216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7833420Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7833793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7834074Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7834308Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7834568Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7834632Z ^
2025-12-04T10:35:20.7835021Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7835026Z 
2025-12-04T10:35:20.7835631Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7835679Z 
2025-12-04T10:35:20.7835683Z 
2025-12-04T10:35:20.7835869Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7836553Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7836558Z 
2025-12-04T10:35:20.7836781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7836966Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7837047Z frames [('total', 1)]
2025-12-04T10:35:20.7837139Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7837543Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7837772Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7837852Z graph_break []
2025-12-04T10:35:20.7838118Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7838217Z Traceback (most recent call last):
2025-12-04T10:35:20.7838558Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7838678Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7839090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7839299Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7839730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7839893Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7840365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7840485Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7840936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7841205Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7841646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7841762Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7842163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7842260Z     return self._compile_to_module()
2025-12-04T10:35:20.7842669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7842807Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7843245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7843347Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7843766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7843999Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7844499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7844608Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7845040Z   File "/tmp/tmpzdi8cq9q/kr/ckrscaiea657dvrltsary4ylwskyylweod6cpsmto36du463ajg5.py", line 50, in <module>
2025-12-04T10:35:20.7845437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7845524Z     kernel.precompile(
2025-12-04T10:35:20.7846084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7846179Z     self._precompile_worker()
2025-12-04T10:35:20.7846688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7846835Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7847337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7847499Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7851489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7851786Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7852173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7852463Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7852657Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7852924Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7852996Z ^
2025-12-04T10:35:20.7853389Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7853394Z 
2025-12-04T10:35:20.7854100Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7854109Z 
2025-12-04T10:35:20.7854159Z 
2025-12-04T10:35:20.7854345Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7855031Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7855037Z 
2025-12-04T10:35:20.7855261Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7855444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7855538Z frames [('total', 1)]
2025-12-04T10:35:20.7855632Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7856087Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7856274Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7856359Z graph_break []
2025-12-04T10:35:20.7856545Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7856630Z frames [('total', 1)]
2025-12-04T10:35:20.7856725Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7856913Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7857304Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7857388Z graph_break []
2025-12-04T10:35:20.7857560Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7857831Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7857941Z Traceback (most recent call last):
2025-12-04T10:35:20.7858283Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7858408Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7858831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7859167Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7859610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7859771Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7860205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7860330Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7860781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7861058Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7861552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7861672Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7862088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7862191Z     return self._compile_to_module()
2025-12-04T10:35:20.7862602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7862741Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7863177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7863288Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7863704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7863941Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7864443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7864553Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7864987Z   File "/tmp/tmpslxjh15f/6e/c6e2ati7d7zyxmnwtijlozondx4tq7ha42evr6vmnxhzami7xmfj.py", line 50, in <module>
2025-12-04T10:35:20.7865379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7865471Z     kernel.precompile(
2025-12-04T10:35:20.7865943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7866035Z     self._precompile_worker()
2025-12-04T10:35:20.7866543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7866699Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7867202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7867374Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7867751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7868026Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7868406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7868688Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7868882Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7869146Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7869217Z ^
2025-12-04T10:35:20.7869611Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7869657Z 
2025-12-04T10:35:20.7870265Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7870270Z 
2025-12-04T10:35:20.7870274Z 
2025-12-04T10:35:20.7870460Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7871138Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7871143Z 
2025-12-04T10:35:20.7871365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7871591Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7871676Z frames [('total', 1)]
2025-12-04T10:35:20.7871776Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7872175Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7872361Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7872443Z graph_break []
2025-12-04T10:35:20.7872620Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7872704Z frames [('total', 1)]
2025-12-04T10:35:20.7872804Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7872985Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7873381Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7873463Z graph_break []
2025-12-04T10:35:20.7873678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7873766Z frames [('total', 1)]
2025-12-04T10:35:20.7873858Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7874039Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7874433Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7874514Z graph_break []
2025-12-04T10:35:20.7875078Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml -
2025-12-04T10:35:20.7875219Z =========================== short test summary info ============================
2025-12-04T10:35:20.7875925Z FAILED [0.5903s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7876202Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7876271Z ^
2025-12-04T10:35:20.7876661Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7876674Z 
2025-12-04T10:35:20.7877277Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7877281Z 
2025-12-04T10:35:20.7877330Z 
2025-12-04T10:35:20.7877515Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7878195Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7878200Z 
2025-12-04T10:35:20.7878431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7878594Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7878762Z ================== 1 failed, 187 deselected, 2 rerun in 3.45s ==================
2025-12-04T10:35:20.7878887Z Got exit code 1
2025-12-04T10:35:20.7878983Z Retrying single test...
2025-12-04T10:35:20.7879388Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml
2025-12-04T10:35:20.7879533Z ============================= test session starts ==============================
2025-12-04T10:35:20.7879833Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7879928Z cachedir: .pytest_cache
2025-12-04T10:35:20.7880387Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7880491Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7880624Z configfile: pytest.ini
2025-12-04T10:35:20.7881098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7881286Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.7881898Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7881994Z Running 1 items in this shard
2025-12-04T10:35:20.7881999Z 
2025-12-04T10:35:20.7882971Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7883619Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7884130Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7884620Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7885041Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7885418Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7885962Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7886407Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7886797Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7887280Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7887661Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7888191Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7888627Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7889075Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7889542Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7889856Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7891325Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7891789Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7892678Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7893255Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7894021Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7894600Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7895355Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7896057Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7896587Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7897222Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7897528Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7898293Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7898408Z ('RERUN', {'yellow': True}) [2.2306s] [100%]
2025-12-04T10:35:20.7899425Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7900105Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7900575Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7901050Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7901473Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7901849Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7902397Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7902835Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7903215Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7903694Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7904065Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7904589Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7905030Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7905478Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7905941Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7906246Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7907915Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7908449Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7909343Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7909878Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7910636Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7911225Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7911982Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7912722Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7913246Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7913881Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7914195Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7915098Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7915205Z ('RERUN', {'yellow': True}) [0.5876s] [100%]
2025-12-04T10:35:20.7916188Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7916821Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7917353Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7917831Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7918262Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7918626Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7919131Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7919564Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7920005Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7920501Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7920871Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7921354Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7921796Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7922245Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7922727Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7923030Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7924494Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7924948Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7925883Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7926473Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7927232Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7927825Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7928576Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7929284Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7929803Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.7930445Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7930757Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.7931521Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7931652Z FAILED [0.5868s] [100%]
2025-12-04T10:35:20.7931657Z 
2025-12-04T10:35:20.7931774Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.7932050Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7932153Z Traceback (most recent call last):
2025-12-04T10:35:20.7932500Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7932627Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7933038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7933248Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7933694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7933857Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7934292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7934416Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7934866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7935138Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7935625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7935751Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7936163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7936263Z     return self._compile_to_module()
2025-12-04T10:35:20.7936684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7936859Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7937296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7937405Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7937831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7938035Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7938532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7938635Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7939288Z   File "/tmp/tmpyaul7sw1/ok/cok5zffncijn2tkbqphtlfw7zd7ky6dze72bc6ubunifdvia6ewh.py", line 50, in <module>
2025-12-04T10:35:20.7939861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7939988Z     kernel.precompile(
2025-12-04T10:35:20.7940459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7940559Z     self._precompile_worker()
2025-12-04T10:35:20.7941074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7941221Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7941725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7941891Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7942324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7942538Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7942914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7943193Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7943388Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7943649Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7943714Z ^
2025-12-04T10:35:20.7944106Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7944111Z 
2025-12-04T10:35:20.7944723Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7944731Z 
2025-12-04T10:35:20.7944735Z 
2025-12-04T10:35:20.7944922Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7945600Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7945605Z 
2025-12-04T10:35:20.7945830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7946052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7946134Z frames [('total', 1)]
2025-12-04T10:35:20.7946237Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7946635Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7946825Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7946909Z graph_break []
2025-12-04T10:35:20.7947174Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7947322Z Traceback (most recent call last):
2025-12-04T10:35:20.7947662Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7947786Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7948206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7948411Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7948849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7949012Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7949444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7949634Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7950084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7950355Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7950803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7950921Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7951333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7951429Z     return self._compile_to_module()
2025-12-04T10:35:20.7951837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7952024Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7952464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7952577Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7952995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7953191Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7953702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7953804Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7954242Z   File "/tmp/tmpr5a8359g/jb/cjbbkbc2mkqgvp4etj4dnbq4cfhvu5ehkoibczcl7cuzci4uxqnp.py", line 50, in <module>
2025-12-04T10:35:20.7954651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7954744Z     kernel.precompile(
2025-12-04T10:35:20.7955222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7955321Z     self._precompile_worker()
2025-12-04T10:35:20.7955881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7956075Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7956581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7956754Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7957133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7957344Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7957721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7958049Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7958238Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7958511Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7958588Z ^
2025-12-04T10:35:20.7958978Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7958982Z 
2025-12-04T10:35:20.7959586Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7959591Z 
2025-12-04T10:35:20.7959634Z 
2025-12-04T10:35:20.7959826Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7960506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7960513Z 
2025-12-04T10:35:20.7960732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7960918Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7961009Z frames [('total', 1)]
2025-12-04T10:35:20.7961112Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7961511Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7961704Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7961790Z graph_break []
2025-12-04T10:35:20.7961965Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7962094Z frames [('total', 1)]
2025-12-04T10:35:20.7962197Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7962380Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7962770Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7962857Z graph_break []
2025-12-04T10:35:20.7962979Z =================================== FAILURES ===================================
2025-12-04T10:35:20.7963251Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.7963351Z Traceback (most recent call last):
2025-12-04T10:35:20.7963692Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.7963817Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.7964229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.7964446Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.7964888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.7965050Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.7965526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.7965647Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.7966148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.7966423Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.7966867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.7966995Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.7967402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.7967542Z     return self._compile_to_module()
2025-12-04T10:35:20.7967953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.7968090Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.7968534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.7968638Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.7969058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.7969254Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.7969794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.7969901Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.7970339Z   File "/tmp/tmpsn7x6ul8/67/c674fw4vlk5qvqpgz5svcbokhhuth32vhiv3hcgelfstal7waxdx.py", line 50, in <module>
2025-12-04T10:35:20.7970731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.7970830Z     kernel.precompile(
2025-12-04T10:35:20.7971301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.7971393Z     self._precompile_worker()
2025-12-04T10:35:20.7971908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.7972098Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.7972609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7972774Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7973150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.7973363Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.7973734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.7974013Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.7974207Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7974459Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7974539Z ^
2025-12-04T10:35:20.7974926Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7974932Z 
2025-12-04T10:35:20.7975539Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7975544Z 
2025-12-04T10:35:20.7975552Z 
2025-12-04T10:35:20.7975776Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7976461Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7976467Z 
2025-12-04T10:35:20.7976695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7976869Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7976960Z frames [('total', 1)]
2025-12-04T10:35:20.7977057Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7977456Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7977692Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7977776Z graph_break []
2025-12-04T10:35:20.7977951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7978040Z frames [('total', 1)]
2025-12-04T10:35:20.7978130Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7978312Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7978717Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7978795Z graph_break []
2025-12-04T10:35:20.7978980Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.7979216Z frames [('total', 1)]
2025-12-04T10:35:20.7979306Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.7979491Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.7979879Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.7979955Z graph_break []
2025-12-04T10:35:20.7980516Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml -
2025-12-04T10:35:20.7980654Z =========================== short test summary info ============================
2025-12-04T10:35:20.7981314Z FAILED [0.5868s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.7981575Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7981691Z ^
2025-12-04T10:35:20.7982080Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.7982087Z 
2025-12-04T10:35:20.7982687Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.7982692Z 
2025-12-04T10:35:20.7982695Z 
2025-12-04T10:35:20.7982880Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.7983551Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7983556Z 
2025-12-04T10:35:20.7983781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.7983933Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.7984101Z ================== 1 failed, 187 deselected, 2 rerun in 3.44s ==================
2025-12-04T10:35:20.7984186Z Got exit code 1
2025-12-04T10:35:20.7984657Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.7985004Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.7985449Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml
2025-12-04T10:35:20.7985585Z ============================= test session starts ==============================
2025-12-04T10:35:20.7985879Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.7985965Z cachedir: .pytest_cache
2025-12-04T10:35:20.7986411Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.7986516Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.7986602Z configfile: pytest.ini
2025-12-04T10:35:20.7987127Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.7987318Z collecting ... collected 188 items / 52 deselected / 136 selected
2025-12-04T10:35:20.7987432Z stepcurrent: skipping 52 already run items.
2025-12-04T10:35:20.7987532Z Running 136 items in this shard
2025-12-04T10:35:20.7987538Z 
2025-12-04T10:35:20.7988529Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.7989171Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.7989677Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.7990155Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.7990580Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.7990943Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.7991448Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.7991918Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.7992301Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.7992782Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.7993152Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.7993636Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.7994065Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.7994512Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.7994983Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.7995286Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.7996763Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.7997222Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.7998119Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.7998692Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.7999450Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8000024Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8000771Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8001470Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8001992Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8002630Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8002935Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8003694Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8003845Z ('RERUN', {'yellow': True}) [2.1362s] [  0%]
2025-12-04T10:35:20.8004827Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8005467Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8005932Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8006408Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8006830Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8007192Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8007696Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8008452Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8008836Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8009317Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8009684Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8010171Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8010657Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8011103Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8011568Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8011868Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8013298Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8013809Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8014700Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8015229Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8016086Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8016673Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8017426Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8018080Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8018596Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8019317Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8019626Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8020387Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8020542Z ('RERUN', {'yellow': True}) [0.6114s] [  0%]
2025-12-04T10:35:20.8021527Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8022161Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8022627Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8023150Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8023568Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8023933Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8024439Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8024878Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8025300Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8025781Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8026153Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8026635Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8027069Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8027513Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8028020Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8028328Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8029749Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8030204Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8031092Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8031633Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8032458Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8033033Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8033783Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8034437Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8034995Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8035630Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8035989Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8036748Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8036875Z FAILED [0.6103s] [  0%]
2025-12-04T10:35:20.8036880Z 
2025-12-04T10:35:20.8037003Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8037279Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8037385Z Traceback (most recent call last):
2025-12-04T10:35:20.8037724Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8037844Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8038260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8038467Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8038900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8039105Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8039537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8039657Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8040107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8040378Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8040821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8040941Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8041346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8041443Z     return self._compile_to_module()
2025-12-04T10:35:20.8041852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8041991Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8042424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8042527Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8042989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8043183Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8043686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8043786Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8044226Z   File "/tmp/tmpxi43olr7/yn/cynvtary3itzbawlh6affqhesb2xtcqbaisowtegssd6eai4qren.py", line 50, in <module>
2025-12-04T10:35:20.8044619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8044747Z     kernel.precompile(
2025-12-04T10:35:20.8045218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8045310Z     self._precompile_worker()
2025-12-04T10:35:20.8045815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8045965Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8046515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8046683Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8047103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8047307Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8047680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8047959Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8048149Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8048408Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8048478Z ^
2025-12-04T10:35:20.8048866Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8048871Z 
2025-12-04T10:35:20.8049520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8049528Z 
2025-12-04T10:35:20.8049534Z 
2025-12-04T10:35:20.8049715Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8050409Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8050414Z 
2025-12-04T10:35:20.8050636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8050821Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8050901Z frames [('total', 1)]
2025-12-04T10:35:20.8050991Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8051390Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8051578Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8051660Z graph_break []
2025-12-04T10:35:20.8051933Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8052033Z Traceback (most recent call last):
2025-12-04T10:35:20.8052372Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8052490Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8052942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8053168Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8053628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8053798Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8054269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8054438Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8054922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8055212Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8055685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8055813Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8056246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8056351Z     return self._compile_to_module()
2025-12-04T10:35:20.8056794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8056972Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8057414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8057520Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8057939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8058134Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8058628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8058738Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8059224Z   File "/tmp/tmp550sq1hj/o5/co5t4hufn7rxg6kbmom234p6bkqznscbt2mhsj7pdu4i2zqsfowc.py", line 50, in <module>
2025-12-04T10:35:20.8059667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8059754Z     kernel.precompile(
2025-12-04T10:35:20.8060224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8060328Z     self._precompile_worker()
2025-12-04T10:35:20.8060831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8060979Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8061485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8061648Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8062026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8062233Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8062604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8062888Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8063076Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8063379Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8063448Z ^
2025-12-04T10:35:20.8063831Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8063836Z 
2025-12-04T10:35:20.8064446Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8064453Z 
2025-12-04T10:35:20.8064459Z 
2025-12-04T10:35:20.8064637Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8065372Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8065377Z 
2025-12-04T10:35:20.8065596Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8065775Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8065860Z frames [('total', 1)]
2025-12-04T10:35:20.8065950Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8066349Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8066532Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8066651Z graph_break []
2025-12-04T10:35:20.8066832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8066910Z frames [('total', 1)]
2025-12-04T10:35:20.8067002Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8067183Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8067586Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8067696Z graph_break []
2025-12-04T10:35:20.8067863Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8068230Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8068376Z Traceback (most recent call last):
2025-12-04T10:35:20.8068813Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8068970Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8069480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8069695Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8070133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8070294Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8070726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8070847Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8071301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8071575Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8072018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8072137Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8072544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8072639Z     return self._compile_to_module()
2025-12-04T10:35:20.8073045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8073227Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8073663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8073773Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8074189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8074384Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8074883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8075031Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8075466Z   File "/tmp/tmp15sxofn6/zk/czkc2el7owodv7cp32zudnyv5dcasm6proasoefqpat425aoh4kw.py", line 50, in <module>
2025-12-04T10:35:20.8075861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8075947Z     kernel.precompile(
2025-12-04T10:35:20.8076418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8076508Z     self._precompile_worker()
2025-12-04T10:35:20.8077010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8077207Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8077709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8077878Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8078253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8078456Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8078834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8079112Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8079307Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8079612Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8079686Z ^
2025-12-04T10:35:20.8080081Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8080089Z 
2025-12-04T10:35:20.8080691Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8080696Z 
2025-12-04T10:35:20.8080700Z 
2025-12-04T10:35:20.8080882Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8081566Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8081571Z 
2025-12-04T10:35:20.8081796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8081979Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8082060Z frames [('total', 1)]
2025-12-04T10:35:20.8082156Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8082551Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8082741Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8082821Z graph_break []
2025-12-04T10:35:20.8083124Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8083204Z frames [('total', 1)]
2025-12-04T10:35:20.8083299Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8083477Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8083977Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8084061Z graph_break []
2025-12-04T10:35:20.8084240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8084326Z frames [('total', 1)]
2025-12-04T10:35:20.8084466Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8084648Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8085046Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8085122Z graph_break []
2025-12-04T10:35:20.8085686Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml -
2025-12-04T10:35:20.8085828Z =========================== short test summary info ============================
2025-12-04T10:35:20.8086503Z FAILED [0.6103s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8086810Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8086880Z ^
2025-12-04T10:35:20.8087276Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8087285Z 
2025-12-04T10:35:20.8087885Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8087889Z 
2025-12-04T10:35:20.8087895Z 
2025-12-04T10:35:20.8088072Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8088758Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8088763Z 
2025-12-04T10:35:20.8088987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8089188Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8089354Z ================== 1 failed, 52 deselected, 2 rerun in 3.39s ===================
2025-12-04T10:35:20.8089438Z Got exit code 1
2025-12-04T10:35:20.8089528Z Retrying single test...
2025-12-04T10:35:20.8089923Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml
2025-12-04T10:35:20.8090056Z ============================= test session starts ==============================
2025-12-04T10:35:20.8090350Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8090439Z cachedir: .pytest_cache
2025-12-04T10:35:20.8090887Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8090989Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8091078Z configfile: pytest.ini
2025-12-04T10:35:20.8091538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8091724Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.8092346Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8092441Z Running 1 items in this shard
2025-12-04T10:35:20.8092446Z 
2025-12-04T10:35:20.8093475Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8094118Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8094581Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8095103Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8095523Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8095893Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8096402Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8096832Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8097255Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8097738Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8098111Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8098598Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8099085Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8099554Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8100071Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8100387Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8101822Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8102282Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8103171Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8103706Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8104517Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8105098Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8105907Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8106566Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8107136Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8107986Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8108318Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8109076Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8109297Z ('RERUN', {'yellow': True}) [2.1380s] [100%]
2025-12-04T10:35:20.8110285Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8110927Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8111394Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8111869Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8112344Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8112727Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8113231Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8113667Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8114043Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8114520Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8114893Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8115373Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8115851Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8116297Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8116815Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8120962Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8122424Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8122988Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8123882Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8124427Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8125196Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8125850Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8126614Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8127277Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8127815Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8128496Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8128819Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8129580Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8129696Z ('RERUN', {'yellow': True}) [0.6123s] [100%]
2025-12-04T10:35:20.8130693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8131337Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8131820Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8132301Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8132776Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8133145Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8133649Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8134087Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8134464Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8134999Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8135369Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8135856Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8136289Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8136736Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8137245Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8137551Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8138984Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8139515Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8140451Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8140991Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8141748Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8142333Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8143087Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8143754Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8144278Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8144962Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8145277Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8146038Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8146130Z FAILED [0.6128s] [100%]
2025-12-04T10:35:20.8146136Z 
2025-12-04T10:35:20.8146253Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8146576Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8146680Z Traceback (most recent call last):
2025-12-04T10:35:20.8147021Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8147160Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8147576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8147785Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8148230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8148436Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8148881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8149008Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8149462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8149735Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8150175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8150302Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8150711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8150813Z     return self._compile_to_module()
2025-12-04T10:35:20.8151267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8151403Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8151843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8151953Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8152373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8152572Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8153066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8153169Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8153598Z   File "/tmp/tmpqpysmj_v/rl/crlee4xho3sfuulocci4ks62a5c4qqccrykw6r5bpu3nv64errlx.py", line 50, in <module>
2025-12-04T10:35:20.8153987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8154077Z     kernel.precompile(
2025-12-04T10:35:20.8154557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8154654Z     self._precompile_worker()
2025-12-04T10:35:20.8155208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8155356Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8155858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8156026Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8156407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8156620Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8157059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8157342Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8157540Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8157800Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8157866Z ^
2025-12-04T10:35:20.8158267Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8158272Z 
2025-12-04T10:35:20.8158883Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8158928Z 
2025-12-04T10:35:20.8158932Z 
2025-12-04T10:35:20.8159121Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8159813Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8159819Z 
2025-12-04T10:35:20.8160048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8160230Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8160313Z frames [('total', 1)]
2025-12-04T10:35:20.8160409Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8160812Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8161013Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8161136Z graph_break []
2025-12-04T10:35:20.8161419Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8161532Z Traceback (most recent call last):
2025-12-04T10:35:20.8161873Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8161997Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8162417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8162625Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8163071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8163234Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8163676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8163800Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8164254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8164531Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8165025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8165147Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8165561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8165664Z     return self._compile_to_module()
2025-12-04T10:35:20.8166082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8166226Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8166661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8166815Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8167232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8167425Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8167928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8168032Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8168456Z   File "/tmp/tmp65vf618w/xg/cxgcneslrqpl3d4g7il3uy432ce4emgcbefjpi7ub4gms3ms35mq.py", line 50, in <module>
2025-12-04T10:35:20.8168899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8168989Z     kernel.precompile(
2025-12-04T10:35:20.8169472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8169567Z     self._precompile_worker()
2025-12-04T10:35:20.8170075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8170233Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8170738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8170915Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8171296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8171555Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8171939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8172224Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8172415Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8172689Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8172761Z ^
2025-12-04T10:35:20.8173161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8173166Z 
2025-12-04T10:35:20.8173768Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8173776Z 
2025-12-04T10:35:20.8173780Z 
2025-12-04T10:35:20.8173963Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8174652Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8174658Z 
2025-12-04T10:35:20.8174879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8175104Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8175187Z frames [('total', 1)]
2025-12-04T10:35:20.8175282Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8175691Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8175876Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8175964Z graph_break []
2025-12-04T10:35:20.8176143Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8176223Z frames [('total', 1)]
2025-12-04T10:35:20.8176319Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8176542Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8176938Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8177022Z graph_break []
2025-12-04T10:35:20.8177148Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8177429Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8177529Z Traceback (most recent call last):
2025-12-04T10:35:20.8177871Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8178001Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8178459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8178668Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8179156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8179315Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8179760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8179878Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8180328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8180695Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8181182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8181312Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8181718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8181818Z     return self._compile_to_module()
2025-12-04T10:35:20.8182231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8182369Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8182813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8182921Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8183339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8183543Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8184041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8184148Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8184581Z   File "/tmp/tmp70tlfw8j/j4/cj4kyrn55u2oea2ot3ifruisxtqfr3swourwuwqnhkj7pz74shex.py", line 50, in <module>
2025-12-04T10:35:20.8185016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8185113Z     kernel.precompile(
2025-12-04T10:35:20.8185583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8185678Z     self._precompile_worker()
2025-12-04T10:35:20.8186191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8186349Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8186860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8187072Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8187459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8187674Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8188048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8188338Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8188533Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8188791Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8188913Z ^
2025-12-04T10:35:20.8189301Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8189309Z 
2025-12-04T10:35:20.8189914Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8189919Z 
2025-12-04T10:35:20.8189927Z 
2025-12-04T10:35:20.8190112Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8190802Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8190807Z 
2025-12-04T10:35:20.8191035Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8191218Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8191342Z frames [('total', 1)]
2025-12-04T10:35:20.8191448Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8191850Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8192043Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8192125Z graph_break []
2025-12-04T10:35:20.8192304Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8192390Z frames [('total', 1)]
2025-12-04T10:35:20.8192486Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8192671Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8193077Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8193155Z graph_break []
2025-12-04T10:35:20.8193341Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8193426Z frames [('total', 1)]
2025-12-04T10:35:20.8193520Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8193710Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8194106Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8194187Z graph_break []
2025-12-04T10:35:20.8194818Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml -
2025-12-04T10:35:20.8194966Z =========================== short test summary info ============================
2025-12-04T10:35:20.8195646Z FAILED [0.6128s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8195910Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8195985Z ^
2025-12-04T10:35:20.8196376Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8196424Z 
2025-12-04T10:35:20.8197025Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8197030Z 
2025-12-04T10:35:20.8197034Z 
2025-12-04T10:35:20.8197219Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8197903Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8197907Z 
2025-12-04T10:35:20.8198136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8198329Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8198497Z ================== 1 failed, 187 deselected, 2 rerun in 3.40s ==================
2025-12-04T10:35:20.8198587Z Got exit code 1
2025-12-04T10:35:20.8198670Z Retrying single test...
2025-12-04T10:35:20.8199074Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml
2025-12-04T10:35:20.8199215Z ============================= test session starts ==============================
2025-12-04T10:35:20.8199507Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8199601Z cachedir: .pytest_cache
2025-12-04T10:35:20.8200053Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8200155Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8200246Z configfile: pytest.ini
2025-12-04T10:35:20.8200749Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8200938Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.8201559Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8201650Z Running 1 items in this shard
2025-12-04T10:35:20.8201655Z 
2025-12-04T10:35:20.8202650Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8203286Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8203760Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8204241Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8204665Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8205093Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8205605Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8206094Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8206475Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8206957Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8207380Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8208132Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8208572Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8209019Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8209483Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8209875Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8211312Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8211773Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8212717Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8213267Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8214027Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8214617Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8215367Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8216082Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8216604Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8217300Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8217618Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8218377Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8218491Z ('RERUN', {'yellow': True}) [2.1103s] [100%]
2025-12-04T10:35:20.8219531Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8220225Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8220694Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8221172Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8221608Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8222018Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8222530Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8222975Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8223357Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8223854Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8224230Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8224752Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8225190Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8225641Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8226117Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8226422Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8227869Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8228331Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8229263Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8229802Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8230561Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8231151Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8231952Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8232632Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8233153Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8233798Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8234151Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8234911Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8235027Z ('RERUN', {'yellow': True}) [0.6017s] [100%]
2025-12-04T10:35:20.8236070Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8236773Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8237243Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8237724Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8238155Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8238530Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8239041Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8239476Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = tmp0.to(tl.float32)
2025-12-04T10:35:20.8239868Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = -448.0
2025-12-04T10:35:20.8240354Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = triton_helpers.maximum(tmp1, tmp2)
2025-12-04T10:35:20.8240729Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = 448.0
2025-12-04T10:35:20.8241263Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = triton_helpers.minimum(tmp3, tmp4)
2025-12-04T10:35:20.8241699Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp6 = tmp5.to(tl.float32)
2025-12-04T10:35:20.8242153Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp7 = tmp6.to(tl.float8e4nv)
2025-12-04T10:35:20.8242618Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp7, xmask)
2025-12-04T10:35:20.8242919Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8244895Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8245352Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8246304Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8246885Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8247653Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8248232Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8248981Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8249693Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8250216Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8250861Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8251169Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8251938Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8252020Z FAILED [0.6050s] [100%]
2025-12-04T10:35:20.8252027Z 
2025-12-04T10:35:20.8252148Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8252434Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8252538Z Traceback (most recent call last):
2025-12-04T10:35:20.8252889Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8253070Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8253486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8253700Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8254135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8254298Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8254738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8254900Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8255369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8255646Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8256091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8256220Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8256625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8256729Z     return self._compile_to_module()
2025-12-04T10:35:20.8257185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8257320Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8257766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8257872Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8258295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8258497Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8258997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8259155Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8259586Z   File "/tmp/tmprjtwvcfc/el/cel3jf55yv3fdcxtist7gkgqhdy4whuzcb3wsuhdw7szv3qsvqe2.py", line 50, in <module>
2025-12-04T10:35:20.8260109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8260212Z     kernel.precompile(
2025-12-04T10:35:20.8260680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8260782Z     self._precompile_worker()
2025-12-04T10:35:20.8261297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8261445Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8261955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8262121Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8262504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8262720Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8263093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8263385Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8263577Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8263883Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8263961Z ^
2025-12-04T10:35:20.8264351Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8264356Z 
2025-12-04T10:35:20.8264962Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8264972Z 
2025-12-04T10:35:20.8264976Z 
2025-12-04T10:35:20.8265153Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8265891Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8265896Z 
2025-12-04T10:35:20.8266122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8266300Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8266386Z frames [('total', 1)]
2025-12-04T10:35:20.8266482Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8266886Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8267080Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8267196Z graph_break []
2025-12-04T10:35:20.8267476Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8267577Z Traceback (most recent call last):
2025-12-04T10:35:20.8267915Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8268042Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8268453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8268658Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8269101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8269259Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8269738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8269860Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8270315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8270592Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8271035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8271156Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8271562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8271659Z     return self._compile_to_module()
2025-12-04T10:35:20.8272071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8272208Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8272643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8272758Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8273178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8273373Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8273913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8274016Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8274450Z   File "/tmp/tmpgifzhj15/jn/cjnekpnjz632m25jtpp77kcxngtyb7yx4vqxkyzhoqszpcrzjbox.py", line 50, in <module>
2025-12-04T10:35:20.8274840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8274939Z     kernel.precompile(
2025-12-04T10:35:20.8275409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8275548Z     self._precompile_worker()
2025-12-04T10:35:20.8276111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8276258Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8276759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8276924Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8277304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8277582Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8277952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8278234Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8278430Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8278684Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8278751Z ^
2025-12-04T10:35:20.8279140Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8279145Z 
2025-12-04T10:35:20.8279749Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8279754Z 
2025-12-04T10:35:20.8279760Z 
2025-12-04T10:35:20.8279987Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8280675Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8280682Z 
2025-12-04T10:35:20.8280907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8281083Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8281165Z frames [('total', 1)]
2025-12-04T10:35:20.8281262Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8281660Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8281846Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8281928Z graph_break []
2025-12-04T10:35:20.8282102Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8282190Z frames [('total', 1)]
2025-12-04T10:35:20.8282279Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8282460Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8282855Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8282931Z graph_break []
2025-12-04T10:35:20.8283050Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8283372Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8283473Z Traceback (most recent call last):
2025-12-04T10:35:20.8283816Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8283938Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8284348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8284563Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8285041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8285206Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8285633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8285752Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8286206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8286474Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8286920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8287084Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8287485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8287585Z     return self._compile_to_module()
2025-12-04T10:35:20.8287996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8288129Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8288565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8288668Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8289091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8289280Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8289824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8289932Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8290362Z   File "/tmp/tmprlwjb0us/sr/csrc5rbquzx4tojj4mzmnf5qdhfus6dc6dgsqjki3e3xcrayg3gl.py", line 50, in <module>
2025-12-04T10:35:20.8290758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8290847Z     kernel.precompile(
2025-12-04T10:35:20.8291317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8291418Z     self._precompile_worker()
2025-12-04T10:35:20.8291924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8292071Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8292578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8292745Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8293128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8293329Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8293750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8294034Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8294224Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8294481Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8294552Z ^
2025-12-04T10:35:20.8294938Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8294983Z 
2025-12-04T10:35:20.8295593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8295599Z 
2025-12-04T10:35:20.8295603Z 
2025-12-04T10:35:20.8295803Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8296527Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8296532Z 
2025-12-04T10:35:20.8296757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8296934Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8297062Z frames [('total', 1)]
2025-12-04T10:35:20.8297152Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8297550Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8297735Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8297816Z graph_break []
2025-12-04T10:35:20.8297997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8298083Z frames [('total', 1)]
2025-12-04T10:35:20.8298171Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8298353Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8298744Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8298826Z graph_break []
2025-12-04T10:35:20.8299002Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8299203Z frames [('total', 1)]
2025-12-04T10:35:20.8299297Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8299480Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8299867Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8299948Z graph_break []
2025-12-04T10:35:20.8300509Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml -
2025-12-04T10:35:20.8300655Z =========================== short test summary info ============================
2025-12-04T10:35:20.8301328Z FAILED [0.6050s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8301587Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8301666Z ^
2025-12-04T10:35:20.8302052Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8302059Z 
2025-12-04T10:35:20.8302661Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8302666Z 
2025-12-04T10:35:20.8302670Z 
2025-12-04T10:35:20.8302892Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8303575Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8303580Z 
2025-12-04T10:35:20.8303802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8303954Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8304127Z ================== 1 failed, 187 deselected, 2 rerun in 3.35s ==================
2025-12-04T10:35:20.8304245Z Got exit code 1
2025-12-04T10:35:20.8304725Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8305084Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.8305481Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml
2025-12-04T10:35:20.8305619Z ============================= test session starts ==============================
2025-12-04T10:35:20.8305907Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8305998Z cachedir: .pytest_cache
2025-12-04T10:35:20.8306451Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8306595Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8306680Z configfile: pytest.ini
2025-12-04T10:35:20.8307143Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8307332Z collecting ... collected 188 items / 53 deselected / 135 selected
2025-12-04T10:35:20.8307448Z stepcurrent: skipping 53 already run items.
2025-12-04T10:35:20.8307538Z Running 135 items in this shard
2025-12-04T10:35:20.8307543Z 
2025-12-04T10:35:20.8308184Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [2.3574s] [  0%]
2025-12-04T10:35:20.8308633Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6871s] [  1%]
2025-12-04T10:35:20.8309689Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8310338Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8310803Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8311277Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8311704Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8312068Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8312533Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8312912Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8313392Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8313827Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8314306Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8314752Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8315218Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8315616Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8319533Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8320017Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8320914Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8321522Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8322292Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8322901Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8323656Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8324321Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8324840Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8325486Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8325793Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8326554Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8326666Z ('RERUN', {'yellow': True}) [0.4334s] [  2%]
2025-12-04T10:35:20.8327173Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [0.8295s] [  2%]
2025-12-04T10:35:20.8327617Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.8208s] [  2%]
2025-12-04T10:35:20.8327623Z 
2025-12-04T10:35:20.8327736Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8328049Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8328157Z Traceback (most recent call last):
2025-12-04T10:35:20.8328500Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8328627Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8329040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8329250Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8329732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8329895Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8330331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8330455Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8330990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8331265Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8331705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8331868Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8332278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8332377Z     return self._compile_to_module()
2025-12-04T10:35:20.8332788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8332923Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8333359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8333470Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8333884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8334079Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8334578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8334681Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8335117Z   File "/tmp/tmp4fmv9ce9/qk/cqkj37cahcu2akcpr46yuu6gzzggkumam7fyykhm7c7rru63cx3r.py", line 48, in <module>
2025-12-04T10:35:20.8335506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8335593Z     kernel.precompile(
2025-12-04T10:35:20.8336115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8336212Z     self._precompile_worker()
2025-12-04T10:35:20.8336717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8336865Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8337368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8337535Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8337912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8338122Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8338536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8338820Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8339012Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8339335Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8339412Z ^
2025-12-04T10:35:20.8339798Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8339849Z 
2025-12-04T10:35:20.8340459Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8340464Z 
2025-12-04T10:35:20.8340468Z 
2025-12-04T10:35:20.8340658Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8341387Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8341392Z 
2025-12-04T10:35:20.8341619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8341801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8341924Z frames [('total', 1)]
2025-12-04T10:35:20.8342021Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8342211Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8342409Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8342494Z graph_break []
2025-12-04T10:35:20.8342757Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8342863Z Traceback (most recent call last):
2025-12-04T10:35:20.8343203Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8343323Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8343738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8343947Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8344384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8344546Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8344981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8345101Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8345559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8345824Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8346315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8346434Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8346846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8346945Z     return self._compile_to_module()
2025-12-04T10:35:20.8347354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8347488Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8347929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8348080Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8348500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8348695Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8349196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8349303Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8349743Z   File "/tmp/tmpow8jft64/jo/cjomr743uymceqrqlwtpvyhmrrneyirmtgtw24mimmkavebltcz5.py", line 80, in <module>
2025-12-04T10:35:20.8350172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.8350264Z     self._wait_futures(scope)
2025-12-04T10:35:20.8350694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.8350792Z     kernel = result.result()
2025-12-04T10:35:20.8351163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.8351302Z     return self.result_fn()
2025-12-04T10:35:20.8351705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.8351846Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.8352175Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.8352180Z 
2025-12-04T10:35:20.8352287Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8352395Z Traceback (most recent call last):
2025-12-04T10:35:20.8352851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.8352927Z     result = job()
2025-12-04T10:35:20.8353434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.8353549Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.8354018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.8354119Z     self._precompile_worker()
2025-12-04T10:35:20.8354625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8354776Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8355281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8355442Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8355865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8356079Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8356456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8356733Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8356888Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8357148Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8357218Z ^
2025-12-04T10:35:20.8357599Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8357611Z 
2025-12-04T10:35:20.8357615Z 
2025-12-04T10:35:20.8358292Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8358298Z 
2025-12-04T10:35:20.8358301Z 
2025-12-04T10:35:20.8358480Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8359160Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8359165Z 
2025-12-04T10:35:20.8359386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8359570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8359695Z frames [('total', 1)]
2025-12-04T10:35:20.8359785Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8359973Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8360168Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8360247Z graph_break []
2025-12-04T10:35:20.8360422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8360505Z frames [('total', 1)]
2025-12-04T10:35:20.8360598Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8360822Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8361122Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.8361244Z graph_break []
2025-12-04T10:35:20.8361360Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8361624Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8361733Z Traceback (most recent call last):
2025-12-04T10:35:20.8362072Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8362199Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8362614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8362820Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8363258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8363417Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8363857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8363976Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8364427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8364699Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8365141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8365260Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8365673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8365771Z     return self._compile_to_module()
2025-12-04T10:35:20.8366182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8366319Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8366753Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8366865Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8367284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8367527Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8368025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8368128Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8368567Z   File "/tmp/tmp9wf8bfri/ij/cijkd6tblkaqxfxpirbwvg3pdzqg6fqsfv4argbcwvnotcrhihbq.py", line 80, in <module>
2025-12-04T10:35:20.8368947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.8369038Z     self._wait_futures(scope)
2025-12-04T10:35:20.8369501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.8369593Z     kernel = result.result()
2025-12-04T10:35:20.8369973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.8370060Z     return self.result_fn()
2025-12-04T10:35:20.8370464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.8370616Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.8370940Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.8370945Z 
2025-12-04T10:35:20.8371053Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8371198Z Traceback (most recent call last):
2025-12-04T10:35:20.8371651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.8371734Z     result = job()
2025-12-04T10:35:20.8372237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.8372349Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.8372823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.8372914Z     self._precompile_worker()
2025-12-04T10:35:20.8373422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8373566Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8374073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8374237Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8374616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8374817Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8375196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8379543Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8379723Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8379988Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8380061Z ^
2025-12-04T10:35:20.8380461Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8380470Z 
2025-12-04T10:35:20.8380474Z 
2025-12-04T10:35:20.8381080Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8381088Z 
2025-12-04T10:35:20.8381092Z 
2025-12-04T10:35:20.8381279Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8382026Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8382033Z 
2025-12-04T10:35:20.8382266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8382448Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8382537Z frames [('total', 1)]
2025-12-04T10:35:20.8382643Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8382829Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8383025Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8383157Z graph_break []
2025-12-04T10:35:20.8383331Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8383424Z frames [('total', 1)]
2025-12-04T10:35:20.8383522Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8383705Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8384016Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.8384100Z graph_break []
2025-12-04T10:35:20.8384330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8384425Z frames [('total', 1)]
2025-12-04T10:35:20.8384518Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8384742Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8385047Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.8385127Z graph_break []
2025-12-04T10:35:20.8385691Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml -
2025-12-04T10:35:20.8385835Z =========================== short test summary info ============================
2025-12-04T10:35:20.8386644Z FAILED [0.8208s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.8386658Z 
2025-12-04T10:35:20.8386766Z Name=triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8386867Z Traceback (most recent call last):
2025-12-04T10:35:20.8387350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.8387438Z     result = job()
2025-12-04T10:35:20.8387948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.8388072Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.8388545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.8388650Z     self._precompile_worker()
2025-12-04T10:35:20.8389160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8389312Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8389839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8390010Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8390390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8390601Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8390972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8391265Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8391464Z triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8391725Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8391810Z ^
2025-12-04T10:35:20.8392194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8392199Z 
2025-12-04T10:35:20.8392205Z 
2025-12-04T10:35:20.8392826Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8392871Z 
2025-12-04T10:35:20.8392875Z 
2025-12-04T10:35:20.8393057Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8393738Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8393750Z 
2025-12-04T10:35:20.8393972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8394170Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8394361Z ============= 1 failed, 2 passed, 53 deselected, 2 rerun in 5.17s ==============
2025-12-04T10:35:20.8394443Z Got exit code 1
2025-12-04T10:35:20.8394536Z Retrying single test...
2025-12-04T10:35:20.8395017Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml
2025-12-04T10:35:20.8395152Z ============================= test session starts ==============================
2025-12-04T10:35:20.8395458Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8395547Z cachedir: .pytest_cache
2025-12-04T10:35:20.8396041Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8396153Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8396239Z configfile: pytest.ini
2025-12-04T10:35:20.8396699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8396895Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.8397501Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8397603Z Running 1 items in this shard
2025-12-04T10:35:20.8397610Z 
2025-12-04T10:35:20.8398582Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8399226Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8399695Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8400173Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8400602Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8400975Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8401445Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8401867Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8402352Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8402729Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8403211Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8403663Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8404168Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8404476Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8405958Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8406451Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8407349Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8408134Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8408959Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8409705Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8410613Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8411279Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8411799Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8412442Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8412758Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8413523Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8413637Z ('RERUN', {'yellow': True}) [2.2308s] [100%]
2025-12-04T10:35:20.8414729Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8415370Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8415883Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8416371Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8416852Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8417225Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8417687Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8418135Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8418627Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8419115Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8419611Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8420063Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8420533Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8420843Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8422264Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8422733Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8423628Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8424167Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8425018Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8425613Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8426419Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8427128Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8427702Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8428387Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8428768Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8429585Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8429715Z ('RERUN', {'yellow': True}) [0.5482s] [100%]
2025-12-04T10:35:20.8430882Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8431519Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8432027Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8432507Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8432938Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8433302Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8433765Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8434144Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8434627Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8435010Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8435490Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8435989Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8436462Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8436764Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8438193Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8438698Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8439598Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8440129Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8440899Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8441521Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8442274Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8443008Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8443566Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8444215Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8444524Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8445297Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8445383Z FAILED [0.5429s] [100%]
2025-12-04T10:35:20.8445390Z 
2025-12-04T10:35:20.8445505Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8445775Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8445883Z Traceback (most recent call last):
2025-12-04T10:35:20.8446225Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8446357Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8446770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8446985Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8447422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8447583Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8448026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8448143Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8448604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8448875Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8449317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8449446Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8449896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8449994Z     return self._compile_to_module()
2025-12-04T10:35:20.8450411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8450546Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8450987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8451100Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8451525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8451771Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8452268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8452378Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8452823Z   File "/tmp/tmpvkc44bpz/bu/cbutcjz7uyiptowv62ao6jtzlcwuiuqbhwqfxlslc23cblleqf5s.py", line 48, in <module>
2025-12-04T10:35:20.8453261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8453358Z     kernel.precompile(
2025-12-04T10:35:20.8453834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8453968Z     self._precompile_worker()
2025-12-04T10:35:20.8454485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8454636Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8455152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8455322Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8455707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8455934Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8456344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8456634Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8456827Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8457089Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8457168Z ^
2025-12-04T10:35:20.8457556Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8457561Z 
2025-12-04T10:35:20.8458179Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8458186Z 
2025-12-04T10:35:20.8458190Z 
2025-12-04T10:35:20.8458371Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8459154Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8459172Z 
2025-12-04T10:35:20.8459401Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8459583Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8459672Z frames [('total', 1)]
2025-12-04T10:35:20.8459768Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8459972Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8460212Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8460296Z graph_break []
2025-12-04T10:35:20.8460569Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8460675Z Traceback (most recent call last):
2025-12-04T10:35:20.8461015Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8461149Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8461561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8461816Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8462259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8462418Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8462854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8463017Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8463471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8463746Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8464227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8464350Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8464760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8464856Z     return self._compile_to_module()
2025-12-04T10:35:20.8465273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8465406Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8465848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8465960Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8466377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8466587Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8467083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8467188Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8467632Z   File "/tmp/tmpck8xdkiw/aq/caqur3gmqqhwrfsrqzn2p6dnz4e6sudvv6abescj6x3g2dbv7d5k.py", line 48, in <module>
2025-12-04T10:35:20.8468032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8468122Z     kernel.precompile(
2025-12-04T10:35:20.8468611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8468707Z     self._precompile_worker()
2025-12-04T10:35:20.8469229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8469379Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8469885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8470053Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8470481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8470696Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8471069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8471350Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8471557Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8471815Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8471926Z ^
2025-12-04T10:35:20.8472319Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8472324Z 
2025-12-04T10:35:20.8472931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8472938Z 
2025-12-04T10:35:20.8472942Z 
2025-12-04T10:35:20.8473128Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8473854Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8473860Z 
2025-12-04T10:35:20.8474125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8474309Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8474395Z frames [('total', 1)]
2025-12-04T10:35:20.8474494Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8474697Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8474882Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8474965Z graph_break []
2025-12-04T10:35:20.8475144Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8475230Z frames [('total', 1)]
2025-12-04T10:35:20.8475325Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8475511Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8475729Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8475816Z graph_break []
2025-12-04T10:35:20.8475964Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8476238Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8476340Z Traceback (most recent call last):
2025-12-04T10:35:20.8476681Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8476808Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8477223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8477442Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8477884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8478040Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8478483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8478605Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8479070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8479338Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8479824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8479952Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8480360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8480471Z     return self._compile_to_module()
2025-12-04T10:35:20.8480881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8481022Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8481475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8481653Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8482070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8482276Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8482775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8482925Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8483361Z   File "/tmp/tmpnv2y0ye5/dz/cdzykcvs42m7nwvtemnrgul3qaliq57ujfwnmfo5bsrzb3pfu6r7.py", line 48, in <module>
2025-12-04T10:35:20.8483759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8483899Z     kernel.precompile(
2025-12-04T10:35:20.8484375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8484476Z     self._precompile_worker()
2025-12-04T10:35:20.8484988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8485143Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8485657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8485826Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8486254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8486467Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8486839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8487125Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8487320Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8487575Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8487658Z ^
2025-12-04T10:35:20.8488045Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8488050Z 
2025-12-04T10:35:20.8488664Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8488668Z 
2025-12-04T10:35:20.8488674Z 
2025-12-04T10:35:20.8488858Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8489538Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8489551Z 
2025-12-04T10:35:20.8489777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8489954Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8490088Z frames [('total', 1)]
2025-12-04T10:35:20.8490184Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8490384Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8490580Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8490664Z graph_break []
2025-12-04T10:35:20.8490840Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8490936Z frames [('total', 1)]
2025-12-04T10:35:20.8491032Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8491220Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8491458Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8491535Z graph_break []
2025-12-04T10:35:20.8491719Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8491804Z frames [('total', 1)]
2025-12-04T10:35:20.8491896Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8492086Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8492283Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8492412Z graph_break []
2025-12-04T10:35:20.8492985Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml -
2025-12-04T10:35:20.8493130Z =========================== short test summary info ============================
2025-12-04T10:35:20.8493840Z FAILED [0.5429s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8494106Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8494174Z ^
2025-12-04T10:35:20.8494576Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8494582Z 
2025-12-04T10:35:20.8495187Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8495192Z 
2025-12-04T10:35:20.8495196Z 
2025-12-04T10:35:20.8495389Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8496076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8496083Z 
2025-12-04T10:35:20.8496317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8496468Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8496636Z ================== 1 failed, 187 deselected, 2 rerun in 3.36s ==================
2025-12-04T10:35:20.8496723Z Got exit code 1
2025-12-04T10:35:20.8496817Z Retrying single test...
2025-12-04T10:35:20.8497225Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml
2025-12-04T10:35:20.8497371Z ============================= test session starts ==============================
2025-12-04T10:35:20.8497675Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8497781Z cachedir: .pytest_cache
2025-12-04T10:35:20.8498230Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8498340Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8498439Z configfile: pytest.ini
2025-12-04T10:35:20.8498896Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8499130Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.8499800Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8499901Z Running 1 items in this shard
2025-12-04T10:35:20.8499905Z 
2025-12-04T10:35:20.8500883Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8501536Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8502056Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8502538Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8503016Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8503401Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8503877Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8504322Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8504814Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8505198Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8505704Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8506203Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8506680Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8506993Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8508861Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8509334Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8510243Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8510791Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8511557Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8512228Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8512981Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8513651Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8514225Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8514865Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8515186Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8516058Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8516168Z ('RERUN', {'yellow': True}) [2.1800s] [100%]
2025-12-04T10:35:20.8517186Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8517843Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8518310Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8518791Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8519213Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8519584Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8520046Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8520425Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8520918Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8521292Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8521778Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8522228Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8522691Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8523003Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8524498Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8525039Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8525940Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8526524Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8527284Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8527904Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8528664Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8529357Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8529886Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8530522Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8530834Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8531599Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8531707Z ('RERUN', {'yellow': True}) [0.5462s] [100%]
2025-12-04T10:35:20.8532674Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8533309Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8533776Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8534259Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8534680Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8535052Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8535514Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8535959Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8536467Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8536836Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8537322Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8537770Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8538281Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8538583Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8540096Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8540599Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8541485Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8542021Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8542776Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8543358Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8544111Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8544772Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8545293Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8545973Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8546288Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8547049Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8547142Z FAILED [0.5408s] [100%]
2025-12-04T10:35:20.8547146Z 
2025-12-04T10:35:20.8547264Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8547572Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8547676Z Traceback (most recent call last):
2025-12-04T10:35:20.8548019Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8548140Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8548550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8548759Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8549196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8549395Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8549826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8549945Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8550396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8550713Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8551152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8551313Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8551716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8551816Z     return self._compile_to_module()
2025-12-04T10:35:20.8552224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8552356Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8552793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8552903Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8553321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8553513Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8554015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8554116Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8554552Z   File "/tmp/tmp9u3haqyo/q5/cq5ggv6tyjmwulg5umff2z5ftgbv4akghqktnjyylfhgqd7scnk3.py", line 48, in <module>
2025-12-04T10:35:20.8554946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8555031Z     kernel.precompile(
2025-12-04T10:35:20.8555506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8555600Z     self._precompile_worker()
2025-12-04T10:35:20.8556115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8556261Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8556764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8556930Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8557310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8557512Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8557932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8558211Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8558409Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8558666Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8558735Z ^
2025-12-04T10:35:20.8559125Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8559130Z 
2025-12-04T10:35:20.8559736Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8559784Z 
2025-12-04T10:35:20.8559788Z 
2025-12-04T10:35:20.8559969Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8560649Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8560654Z 
2025-12-04T10:35:20.8560926Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8561103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8561186Z frames [('total', 1)]
2025-12-04T10:35:20.8561345Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8561543Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8561730Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8561814Z graph_break []
2025-12-04T10:35:20.8562079Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8562177Z Traceback (most recent call last):
2025-12-04T10:35:20.8562520Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8562639Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8563057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8563261Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8563695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8563864Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8564298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8564416Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8564864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8565135Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8565583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8565701Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8566109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8566205Z     return self._compile_to_module()
2025-12-04T10:35:20.8566610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8566751Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8567187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8567291Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8567758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8567954Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8568458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8568561Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8568991Z   File "/tmp/tmp6s2bahjm/sc/csciw34bkacuv2osa6cwp2teninfsi36h2hgq2muzcbxdy22dtxq.py", line 48, in <module>
2025-12-04T10:35:20.8569383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8569511Z     kernel.precompile(
2025-12-04T10:35:20.8569981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8570084Z     self._precompile_worker()
2025-12-04T10:35:20.8570591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8570782Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8571287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8571492Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8571874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8572077Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8572449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8572730Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8572920Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8573179Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8573251Z ^
2025-12-04T10:35:20.8573636Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8573645Z 
2025-12-04T10:35:20.8574253Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8574261Z 
2025-12-04T10:35:20.8574264Z 
2025-12-04T10:35:20.8574439Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8575117Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8575122Z 
2025-12-04T10:35:20.8575344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8575524Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8575608Z frames [('total', 1)]
2025-12-04T10:35:20.8575702Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8575904Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8576088Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8576167Z graph_break []
2025-12-04T10:35:20.8576343Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8576423Z frames [('total', 1)]
2025-12-04T10:35:20.8576514Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8576692Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8576885Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8576963Z graph_break []
2025-12-04T10:35:20.8577125Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8577390Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _
2025-12-04T10:35:20.8577499Z Traceback (most recent call last):
2025-12-04T10:35:20.8577835Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8577959Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8578366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8578612Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8579099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8579256Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8579689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8579808Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8580305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8580580Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8581058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8581184Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8581595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8581689Z     return self._compile_to_module()
2025-12-04T10:35:20.8582112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8582243Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8582678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8582787Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8583203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8583402Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8583906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8584009Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8584440Z   File "/tmp/tmp2kz9efe3/ft/cftx3qhvy2fdl5dt5qnijw2cononx6e2pf346dkblcuqnkzclkf3.py", line 48, in <module>
2025-12-04T10:35:20.8584832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8584915Z     kernel.precompile(
2025-12-04T10:35:20.8585398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8585492Z     self._precompile_worker()
2025-12-04T10:35:20.8586047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8586195Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8586701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8586865Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8587240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8587489Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8587867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8588145Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8588337Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8588599Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8588671Z ^
2025-12-04T10:35:20.8589104Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8589109Z 
2025-12-04T10:35:20.8589708Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8589712Z 
2025-12-04T10:35:20.8589719Z 
2025-12-04T10:35:20.8589903Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8590617Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8590622Z 
2025-12-04T10:35:20.8590846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8591062Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8591143Z frames [('total', 1)]
2025-12-04T10:35:20.8591242Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8591438Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8591622Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8591702Z graph_break []
2025-12-04T10:35:20.8591881Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8591961Z frames [('total', 1)]
2025-12-04T10:35:20.8592055Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8592235Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8592437Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8592511Z graph_break []
2025-12-04T10:35:20.8592683Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8592766Z frames [('total', 1)]
2025-12-04T10:35:20.8592855Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8593036Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8593229Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8593307Z graph_break []
2025-12-04T10:35:20.8593862Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml -
2025-12-04T10:35:20.8594003Z =========================== short test summary info ============================
2025-12-04T10:35:20.8594664Z FAILED [0.5408s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8594924Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8594993Z ^
2025-12-04T10:35:20.8595377Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8595386Z 
2025-12-04T10:35:20.8595988Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8595993Z 
2025-12-04T10:35:20.8595997Z 
2025-12-04T10:35:20.8596174Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8596897Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8596902Z 
2025-12-04T10:35:20.8597125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8597273Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8597441Z ================== 1 failed, 187 deselected, 2 rerun in 3.30s ==================
2025-12-04T10:35:20.8597518Z Got exit code 1
2025-12-04T10:35:20.8597985Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda
2025-12-04T10:35:20.8598460Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.8598862Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml
2025-12-04T10:35:20.8598995Z ============================= test session starts ==============================
2025-12-04T10:35:20.8599349Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8599443Z cachedir: .pytest_cache
2025-12-04T10:35:20.8599886Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8600030Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8600119Z configfile: pytest.ini
2025-12-04T10:35:20.8600575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8600772Z collecting ... collected 188 items / 56 deselected / 132 selected
2025-12-04T10:35:20.8600886Z stepcurrent: skipping 56 already run items.
2025-12-04T10:35:20.8600976Z Running 132 items in this shard
2025-12-04T10:35:20.8600980Z 
2025-12-04T10:35:20.8601983Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8602623Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8603091Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8603568Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8603991Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8604361Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8604825Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8605204Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8605685Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8606058Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8606540Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8607029Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8607499Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8608058Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8609495Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8610056Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8611070Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8611646Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8612514Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8613140Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8613948Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8614658Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8615216Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8615937Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8616293Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8617112Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8617233Z ('RERUN', {'yellow': True}) [2.1125s] [  0%]
2025-12-04T10:35:20.8618290Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8618982Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8619496Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8619976Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8620456Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8620824Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8621286Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8621667Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8622191Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8622564Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8623045Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8623535Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8623997Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8624345Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8625773Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8626235Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8627127Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8627663Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8628422Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8628999Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8629755Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8630413Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8630938Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8631574Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8631924Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8632689Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8632796Z ('RERUN', {'yellow': True}) [0.5621s] [  0%]
2025-12-04T10:35:20.8633786Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8634460Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8634924Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8635400Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8635857Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8636279Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8636780Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8637160Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8637639Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8638013Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8638496Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8638942Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8639409Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8639713Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8641137Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8641603Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8642491Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8643029Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8643827Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8644414Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8645163Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8645828Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8646464Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8647100Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8647451Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8648212Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8648337Z FAILED [0.5672s] [  0%]
2025-12-04T10:35:20.8648342Z 
2025-12-04T10:35:20.8648456Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8648736Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8648838Z Traceback (most recent call last):
2025-12-04T10:35:20.8649174Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8649297Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8649707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8649916Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8650352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8650516Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8650952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8651071Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8651522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8651792Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8652233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8652353Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8652760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8652859Z     return self._compile_to_module()
2025-12-04T10:35:20.8653271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8653403Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8653840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8657681Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8658194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8658398Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8658907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8659013Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8659524Z   File "/tmp/tmpq0flfvi1/5w/c5w2caww7qw3cyy7psustrl2ltcfrwhgdetbf5cqse2ozgbeny5k.py", line 48, in <module>
2025-12-04T10:35:20.8659924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8660059Z     kernel.precompile(
2025-12-04T10:35:20.8660545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8660644Z     self._precompile_worker()
2025-12-04T10:35:20.8661162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8661309Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8661856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8662036Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8662459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8662675Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8663050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8663332Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8663528Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8663785Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8663855Z ^
2025-12-04T10:35:20.8664249Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8664256Z 
2025-12-04T10:35:20.8664863Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8664871Z 
2025-12-04T10:35:20.8664875Z 
2025-12-04T10:35:20.8665066Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8665763Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8665768Z 
2025-12-04T10:35:20.8666000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8666183Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8666269Z frames [('total', 1)]
2025-12-04T10:35:20.8666375Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8666577Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8666763Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8666854Z graph_break []
2025-12-04T10:35:20.8667132Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8667241Z Traceback (most recent call last):
2025-12-04T10:35:20.8667584Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8667704Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8668122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8668376Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8668816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8668987Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8669420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8669544Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8669998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8670316Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8670762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8670882Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8671293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8671432Z     return self._compile_to_module()
2025-12-04T10:35:20.8671841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8672017Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8672454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8672561Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8672985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8673182Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8673687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8673789Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8674226Z   File "/tmp/tmph40w8cnw/pf/cpfiyr5f7fc3iejhwarchvpj45snfoajypagfyswrxkahe2jdhlh.py", line 48, in <module>
2025-12-04T10:35:20.8674631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8674725Z     kernel.precompile(
2025-12-04T10:35:20.8675204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8675301Z     self._precompile_worker()
2025-12-04T10:35:20.8675831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8676008Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8676517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8676683Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8677066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8677273Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8677653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8677935Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8678129Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8678389Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8678457Z ^
2025-12-04T10:35:20.8678898Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8678904Z 
2025-12-04T10:35:20.8679514Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8679519Z 
2025-12-04T10:35:20.8679523Z 
2025-12-04T10:35:20.8679701Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8680397Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8680442Z 
2025-12-04T10:35:20.8680665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8680846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8680930Z frames [('total', 1)]
2025-12-04T10:35:20.8681028Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8681234Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8681469Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8681556Z graph_break []
2025-12-04T10:35:20.8681729Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8681810Z frames [('total', 1)]
2025-12-04T10:35:20.8681953Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8682135Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8682327Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8682411Z graph_break []
2025-12-04T10:35:20.8682527Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8682805Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8682911Z Traceback (most recent call last):
2025-12-04T10:35:20.8683255Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8683384Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8683800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8684007Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8684452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8684616Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8685052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8685169Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8685623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8685905Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8686349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8686468Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8686880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8686982Z     return self._compile_to_module()
2025-12-04T10:35:20.8687401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8687538Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8687972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8688153Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8688576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8688778Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8689274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8689380Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8689810Z   File "/tmp/tmpj22crl5u/ls/cls5hdx2u5fkcjbxp6gkqmeb3atdj64t735b5h4xksok2uhuhdww.py", line 48, in <module>
2025-12-04T10:35:20.8690243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8690334Z     kernel.precompile(
2025-12-04T10:35:20.8690811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8690906Z     self._precompile_worker()
2025-12-04T10:35:20.8691464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8691612Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8692115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8692323Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8692707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8692920Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8693404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8693786Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8694049Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8694392Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8694495Z ^
2025-12-04T10:35:20.8694902Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8694910Z 
2025-12-04T10:35:20.8695521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8695530Z 
2025-12-04T10:35:20.8695533Z 
2025-12-04T10:35:20.8695723Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8696412Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8696417Z 
2025-12-04T10:35:20.8696643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8696823Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8696903Z frames [('total', 1)]
2025-12-04T10:35:20.8697001Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8697203Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8697386Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8697471Z graph_break []
2025-12-04T10:35:20.8697646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8697736Z frames [('total', 1)]
2025-12-04T10:35:20.8697831Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8698015Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8698278Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8698359Z graph_break []
2025-12-04T10:35:20.8698532Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8698623Z frames [('total', 1)]
2025-12-04T10:35:20.8698716Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8698895Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8699159Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8699242Z graph_break []
2025-12-04T10:35:20.8699805Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml -
2025-12-04T10:35:20.8699993Z =========================== short test summary info ============================
2025-12-04T10:35:20.8700675Z FAILED [0.5672s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8700941Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8701009Z ^
2025-12-04T10:35:20.8701452Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8701457Z 
2025-12-04T10:35:20.8702059Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8702102Z 
2025-12-04T10:35:20.8702106Z 
2025-12-04T10:35:20.8702293Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8702990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8702994Z 
2025-12-04T10:35:20.8703218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8703377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8703547Z ================== 1 failed, 56 deselected, 2 rerun in 3.28s ===================
2025-12-04T10:35:20.8703625Z Got exit code 1
2025-12-04T10:35:20.8703720Z Retrying single test...
2025-12-04T10:35:20.8704119Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml
2025-12-04T10:35:20.8704262Z ============================= test session starts ==============================
2025-12-04T10:35:20.8704558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8704651Z cachedir: .pytest_cache
2025-12-04T10:35:20.8705104Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8705207Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8705298Z configfile: pytest.ini
2025-12-04T10:35:20.8705768Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8705958Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.8706581Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8706675Z Running 1 items in this shard
2025-12-04T10:35:20.8706680Z 
2025-12-04T10:35:20.8707918Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8708651Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8709120Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8709604Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8710029Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8710399Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8710920Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8711301Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8711791Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8712214Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8712708Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8713210Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8713673Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8713989Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8715433Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8715896Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8716797Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8717339Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8718096Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8718682Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8719437Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8720103Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8720679Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8721318Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8721632Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8722390Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8722539Z ('RERUN', {'yellow': True}) [2.0963s] [100%]
2025-12-04T10:35:20.8723532Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8724204Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8724680Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8725227Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8725654Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8726023Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8726488Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8726869Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8727350Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8727733Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8728211Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8728665Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8729135Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8729438Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8730872Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8731331Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8732273Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8732808Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8733568Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8734156Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8734945Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8735613Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8736223Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8736869Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8737215Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8737994Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8738103Z ('RERUN', {'yellow': True}) [0.5626s] [100%]
2025-12-04T10:35:20.8739136Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8739779Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8740243Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8740733Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8741153Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8741519Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8741993Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8742369Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8742863Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8743231Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8743715Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8744223Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8744689Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8744999Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8746426Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8746933Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8747866Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8748406Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8749200Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8749778Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8750537Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8751194Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8751722Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8752359Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8752677Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8753436Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8753521Z FAILED [0.5700s] [100%]
2025-12-04T10:35:20.8753525Z 
2025-12-04T10:35:20.8753651Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8753929Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8754039Z Traceback (most recent call last):
2025-12-04T10:35:20.8754385Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8754506Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8754932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8755138Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8755619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8755787Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8756221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8756343Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8756793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8757067Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8757555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8757677Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8758098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8758198Z     return self._compile_to_module()
2025-12-04T10:35:20.8758651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8758795Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8759237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8759390Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8759817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8760014Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8760524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8760633Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8761080Z   File "/tmp/tmpkcnluhrz/dq/cdqy4cnsahg4ljqdh4nqllbg7ybyholsnncxdr5ckvmm6yoaodzl.py", line 48, in <module>
2025-12-04T10:35:20.8761490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8761584Z     kernel.precompile(
2025-12-04T10:35:20.8762061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8762157Z     self._precompile_worker()
2025-12-04T10:35:20.8762664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8762822Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8763333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8763504Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8763890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8764096Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8764477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8764764Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8764956Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8765226Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8765302Z ^
2025-12-04T10:35:20.8765691Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8765704Z 
2025-12-04T10:35:20.8766364Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8766370Z 
2025-12-04T10:35:20.8766374Z 
2025-12-04T10:35:20.8766560Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8767260Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8767267Z 
2025-12-04T10:35:20.8767495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8767724Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8767809Z frames [('total', 1)]
2025-12-04T10:35:20.8767903Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8768118Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8768308Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8768385Z graph_break []
2025-12-04T10:35:20.8768678Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8768851Z Traceback (most recent call last):
2025-12-04T10:35:20.8769196Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8769317Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8769773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8769992Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8770429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8770595Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8771030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8771151Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8771619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8771889Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8772335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8772466Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8772875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8772985Z     return self._compile_to_module()
2025-12-04T10:35:20.8773395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8773528Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8773972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8774078Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8774506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8774701Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8775198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8775312Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8775748Z   File "/tmp/tmp820zqs63/ic/cicqod7q7sby7qdsaeiquh7rxvgjs7ril7n3gxahaeewuobzbbus.py", line 48, in <module>
2025-12-04T10:35:20.8776322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8776416Z     kernel.precompile(
2025-12-04T10:35:20.8776891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8776989Z     self._precompile_worker()
2025-12-04T10:35:20.8777493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8777642Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8778196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8778358Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8778741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8778951Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8779426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8779717Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8779911Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8780216Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8780296Z ^
2025-12-04T10:35:20.8780684Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8780692Z 
2025-12-04T10:35:20.8781304Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8781309Z 
2025-12-04T10:35:20.8781315Z 
2025-12-04T10:35:20.8781498Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8782202Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8782207Z 
2025-12-04T10:35:20.8782434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8782618Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8782710Z frames [('total', 1)]
2025-12-04T10:35:20.8782810Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8783009Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8783199Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8783278Z graph_break []
2025-12-04T10:35:20.8783459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8783546Z frames [('total', 1)]
2025-12-04T10:35:20.8783635Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8783831Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8784027Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8784105Z graph_break []
2025-12-04T10:35:20.8784232Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8784515Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8784618Z Traceback (most recent call last):
2025-12-04T10:35:20.8784962Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8785082Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8785500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8785752Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8786242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8786407Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8786837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8786966Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8787414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8787724Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8788171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8788292Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8788711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8788848Z     return self._compile_to_module()
2025-12-04T10:35:20.8789258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8789439Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8789873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8789980Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8790405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8790593Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8791096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8791199Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8791632Z   File "/tmp/tmpjm0ifcgj/db/cdbjs3al7zf5ry4erh6tyo76syihbnegt37mrf3tcwqxlsjtr23n.py", line 48, in <module>
2025-12-04T10:35:20.8792025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8792115Z     kernel.precompile(
2025-12-04T10:35:20.8792591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8792684Z     self._precompile_worker()
2025-12-04T10:35:20.8793188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8793340Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8793843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8794010Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8794397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8794599Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8794977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8795259Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8795447Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8795731Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8795815Z ^
2025-12-04T10:35:20.8796253Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8796264Z 
2025-12-04T10:35:20.8796960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8796965Z 
2025-12-04T10:35:20.8796969Z 
2025-12-04T10:35:20.8797147Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8797839Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8797891Z 
2025-12-04T10:35:20.8798111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8798293Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8798372Z frames [('total', 1)]
2025-12-04T10:35:20.8798467Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8798673Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8798895Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8798974Z graph_break []
2025-12-04T10:35:20.8799152Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8799236Z frames [('total', 1)]
2025-12-04T10:35:20.8799381Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8799558Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8799755Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8799836Z graph_break []
2025-12-04T10:35:20.8800009Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8800090Z frames [('total', 1)]
2025-12-04T10:35:20.8800182Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8800366Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8800562Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8800639Z graph_break []
2025-12-04T10:35:20.8801194Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml -
2025-12-04T10:35:20.8801334Z =========================== short test summary info ============================
2025-12-04T10:35:20.8802011Z FAILED [0.5700s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8802272Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8802342Z ^
2025-12-04T10:35:20.8802726Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8802730Z 
2025-12-04T10:35:20.8803340Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8803345Z 
2025-12-04T10:35:20.8803351Z 
2025-12-04T10:35:20.8803528Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8804212Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8804222Z 
2025-12-04T10:35:20.8804442Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8804590Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8804766Z ================== 1 failed, 187 deselected, 2 rerun in 3.26s ==================
2025-12-04T10:35:20.8804844Z Got exit code 1
2025-12-04T10:35:20.8804932Z Retrying single test...
2025-12-04T10:35:20.8805380Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml
2025-12-04T10:35:20.8805517Z ============================= test session starts ==============================
2025-12-04T10:35:20.8805812Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8805897Z cachedir: .pytest_cache
2025-12-04T10:35:20.8806346Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8806451Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8806605Z configfile: pytest.ini
2025-12-04T10:35:20.8807071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8807255Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.8808038Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8808132Z Running 1 items in this shard
2025-12-04T10:35:20.8808207Z 
2025-12-04T10:35:20.8809267Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8810014Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8810515Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8811029Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8811482Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8811875Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8812370Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8812772Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8813292Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8813693Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8814206Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8814687Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8815181Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8815515Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8817136Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8817599Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8818497Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8819073Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8819896Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8820473Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8821267Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8821928Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8822501Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8823148Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8823456Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8824225Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8824331Z ('RERUN', {'yellow': True}) [2.1009s] [100%]
2025-12-04T10:35:20.8825318Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8825950Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8826411Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8826895Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8827318Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8827696Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8828156Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8828530Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8829062Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8829435Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8829924Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8830370Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8830841Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8831190Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8832655Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8833124Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8834052Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8834592Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8835348Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8835983Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8836735Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8837398Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8837916Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8838552Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8838863Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8839617Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8839729Z ('RERUN', {'yellow': True}) [0.5651s] [100%]
2025-12-04T10:35:20.8840710Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0
2025-12-04T10:35:20.8841381Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8841849Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8842327Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8842750Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     xmask = xindex < xnumel
2025-12-04T10:35:20.8843153Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     x0 = xindex
2025-12-04T10:35:20.8843616Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:20.8843990Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp1 = -448.0
2025-12-04T10:35:20.8844513Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp2 = triton_helpers.maximum(tmp0, tmp1)
2025-12-04T10:35:20.8844895Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp3 = 448.0
2025-12-04T10:35:20.8845373Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp4 = triton_helpers.minimum(tmp2, tmp3)
2025-12-04T10:35:20.8845866Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tmp5 = tmp4.to(tl.float8e4nv)
2025-12-04T10:35:20.8846380Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     tl.store(out_ptr0 + (x0), tmp5, xmask)
2025-12-04T10:35:20.8846679Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 
2025-12-04T10:35:20.8848106Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8848566Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last):
2025-12-04T10:35:20.8849458Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8849989Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8850754Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8851328Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8852081Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8852734Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8853318Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:20.8853958Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8854262Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^
2025-12-04T10:35:20.8855025Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8855146Z FAILED [0.5674s] [100%]
2025-12-04T10:35:20.8855150Z 
2025-12-04T10:35:20.8855268Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8855547Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8855646Z Traceback (most recent call last):
2025-12-04T10:35:20.8855989Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8856155Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8856569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8856830Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8857261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8857421Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8857854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8857973Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8858438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8858708Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8859195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8859324Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8859729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8859832Z     return self._compile_to_module()
2025-12-04T10:35:20.8860238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8860368Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8860808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8860911Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8861330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8861528Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8862025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8862129Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8862991Z   File "/tmp/tmp4_w4wfhy/zg/czg6fy373v4kzjajvi2xsq24v33lkfuimgpa4bzhv4ehj4bx3ne2.py", line 48, in <module>
2025-12-04T10:35:20.8863385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8863477Z     kernel.precompile(
2025-12-04T10:35:20.8864044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8864144Z     self._precompile_worker()
2025-12-04T10:35:20.8864653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8864802Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8865316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8865478Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8865943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8866158Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8866530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8866825Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8867056Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8867314Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8867392Z ^
2025-12-04T10:35:20.8867819Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8867824Z 
2025-12-04T10:35:20.8868433Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8868440Z 
2025-12-04T10:35:20.8868444Z 
2025-12-04T10:35:20.8868626Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8869323Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8869328Z 
2025-12-04T10:35:20.8869549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8869727Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8869811Z frames [('total', 1)]
2025-12-04T10:35:20.8869906Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8870102Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8870287Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8870367Z graph_break []
2025-12-04T10:35:20.8870647Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8870745Z Traceback (most recent call last):
2025-12-04T10:35:20.8871090Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8871217Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8871630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8871835Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8872503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8872675Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8873113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8873231Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8873681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8874007Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8874448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8874572Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8874976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8875075Z     return self._compile_to_module()
2025-12-04T10:35:20.8875488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8875666Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8876099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8876210Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8876629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8876824Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8877360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8877463Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8877954Z   File "/tmp/tmpoeoji2t9/pw/cpwckhent4frjpzfsvfcqqjiwoshhlizh2s5yaa5id7hop6iyw5b.py", line 48, in <module>
2025-12-04T10:35:20.8878342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8878438Z     kernel.precompile(
2025-12-04T10:35:20.8878907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8879000Z     self._precompile_worker()
2025-12-04T10:35:20.8879518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8879664Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8880167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8880336Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8880710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8880915Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8881283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8881560Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8881759Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8882013Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8882085Z ^
2025-12-04T10:35:20.8882471Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8882476Z 
2025-12-04T10:35:20.8883078Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8883086Z 
2025-12-04T10:35:20.8883091Z 
2025-12-04T10:35:20.8883274Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8883958Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8883963Z 
2025-12-04T10:35:20.8884242Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8884422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8884505Z frames [('total', 1)]
2025-12-04T10:35:20.8884604Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8884806Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8885005Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8885084Z graph_break []
2025-12-04T10:35:20.8885259Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8885387Z frames [('total', 1)]
2025-12-04T10:35:20.8885478Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8885677Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8885908Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8885986Z graph_break []
2025-12-04T10:35:20.8886105Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8886385Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _
2025-12-04T10:35:20.8886529Z Traceback (most recent call last):
2025-12-04T10:35:20.8886883Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated
2025-12-04T10:35:20.8887069Z     y_compiled = compiled_fp8_cast(x, dst_dtype)
2025-12-04T10:35:20.8887479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8887694Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8888136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8888300Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8888743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8888860Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8889314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8889583Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8890028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8890152Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8890561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8890667Z     return self._compile_to_module()
2025-12-04T10:35:20.8891082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8891214Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8891655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8891758Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8892181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8892377Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8892874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8892984Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8893413Z   File "/tmp/tmp9088zri0/jt/cjtdplg4g3nym6jmjgsimctqrfgfr62t2lqpd6quku2hp6gzqcfz.py", line 48, in <module>
2025-12-04T10:35:20.8893855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8893948Z     kernel.precompile(
2025-12-04T10:35:20.8894422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8894521Z     self._precompile_worker()
2025-12-04T10:35:20.8895025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8895176Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8895730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8895892Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8896275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8896485Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8896898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8897186Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8897376Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8897671Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8897746Z ^
2025-12-04T10:35:20.8898132Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8898139Z 
2025-12-04T10:35:20.8898750Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8898754Z 
2025-12-04T10:35:20.8898761Z 
2025-12-04T10:35:20.8898938Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8899688Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8899693Z 
2025-12-04T10:35:20.8899915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8900095Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8900181Z frames [('total', 1)]
2025-12-04T10:35:20.8900278Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8900474Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8900660Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8900737Z graph_break []
2025-12-04T10:35:20.8900920Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8901002Z frames [('total', 1)]
2025-12-04T10:35:20.8901091Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8901282Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8901475Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8901550Z graph_break []
2025-12-04T10:35:20.8901726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8901808Z frames [('total', 1)]
2025-12-04T10:35:20.8901895Z stats [('calls_captured', 8)]
2025-12-04T10:35:20.8902076Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)]
2025-12-04T10:35:20.8902270Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8902350Z graph_break []
2025-12-04T10:35:20.8902901Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml -
2025-12-04T10:35:20.8903089Z =========================== short test summary info ============================
2025-12-04T10:35:20.8903775Z FAILED [0.5674s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:20.8904033Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8904109Z ^
2025-12-04T10:35:20.8904495Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:20.8904500Z 
2025-12-04T10:35:20.8905158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8905163Z 
2025-12-04T10:35:20.8905167Z 
2025-12-04T10:35:20.8905346Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8906058Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8906063Z 
2025-12-04T10:35:20.8906354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8906503Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.8906712Z ================== 1 failed, 187 deselected, 2 rerun in 3.27s ==================
2025-12-04T10:35:20.8906793Z Got exit code 1
2025-12-04T10:35:20.8907267Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda
2025-12-04T10:35:20.8907628Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.8908188Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml
2025-12-04T10:35:20.8908328Z ============================= test session starts ==============================
2025-12-04T10:35:20.8908627Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.8908712Z cachedir: .pytest_cache
2025-12-04T10:35:20.8909171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.8913189Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.8913282Z configfile: pytest.ini
2025-12-04T10:35:20.8913755Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.8913951Z collecting ... collected 188 items / 57 deselected / 131 selected
2025-12-04T10:35:20.8914073Z stepcurrent: skipping 57 already run items.
2025-12-04T10:35:20.8914180Z Running 131 items in this shard
2025-12-04T10:35:20.8914185Z 
2025-12-04T10:35:20.8914629Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [2.3500s] [  0%]
2025-12-04T10:35:20.8915079Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6600s] [  1%]
2025-12-04T10:35:20.8916071Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.8916723Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8917196Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8917770Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8918200Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.8918561Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.8919069Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8919517Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.8920009Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.8920453Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.8920873Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.8921395Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.8921861Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.8922218Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.8923774Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.8924230Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.8924968Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.8925398Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.8926114Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.8926722Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.8927444Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.8927877Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.8928599Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.8929151Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.8929930Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.8930643Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.8931357Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.8931996Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.8932721Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.8933367Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.8934129Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.8934468Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.8935050Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.8935346Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.8935796Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.8936735Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8937267Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8938023Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8938596Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8939419Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8940077Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8940598Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.8941250Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8941708Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8942235Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8942653Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.8943020Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.8943521Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8944005Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.8944359Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.8945060Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.8945180Z ('RERUN', {'yellow': True}) [0.2030s] [  2%]
2025-12-04T10:35:20.8945728Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6007s] [  2%]
2025-12-04T10:35:20.8946160Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 FAILED [0.5834s] [  2%]
2025-12-04T10:35:20.8946287Z 
2025-12-04T10:35:20.8946416Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.8946691Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.8946800Z Traceback (most recent call last):
2025-12-04T10:35:20.8947110Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.8947217Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.8947641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8947849Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8948282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8948453Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8948883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8949016Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8949473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8949749Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8950197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8950319Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8950727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8950826Z     return self._compile_to_module()
2025-12-04T10:35:20.8951243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8951385Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8951831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8951939Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8952413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8952613Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8953126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8953231Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8953650Z   File "/tmp/tmp_dcbdldv/ft/cftymixla4jkfzzdywjnejngcxzvb4g2mbvpgv5nfcwizkzgc37a.py", line 51, in <module>
2025-12-04T10:35:20.8954067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.8954199Z     kernel.precompile(
2025-12-04T10:35:20.8954697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.8954798Z     self._precompile_worker()
2025-12-04T10:35:20.8955313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8955481Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8956086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8956253Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8956690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8956905Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8957290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8957576Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8957773Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.8958063Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8958167Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8958296Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8958389Z     xmask = xindex < xnumel
2025-12-04T10:35:20.8958467Z     x0 = xindex
2025-12-04T10:35:20.8958614Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8958715Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.8958790Z            ^
2025-12-04T10:35:20.8959125Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.8959133Z 
2025-12-04T10:35:20.8959746Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8959751Z 
2025-12-04T10:35:20.8959754Z 
2025-12-04T10:35:20.8959948Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8960633Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.8960638Z 
2025-12-04T10:35:20.8960867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8961054Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8961141Z frames [('total', 1)]
2025-12-04T10:35:20.8961240Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.8961432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.8961833Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8961923Z graph_break []
2025-12-04T10:35:20.8962197Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.8962352Z Traceback (most recent call last):
2025-12-04T10:35:20.8962679Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.8962789Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.8963209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8963417Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8963853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8964066Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8964497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8964624Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8965081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8965394Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8965868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8966004Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8966469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8966585Z     return self._compile_to_module()
2025-12-04T10:35:20.8966998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8967141Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8967582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8967691Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8968121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8968313Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8968815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8968924Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8969353Z   File "/tmp/tmprx7n17gc/f2/cf2luty6z37x7p7zbbfqpubgqy747cohumtlkgt74sdrgi7tsfeo.py", line 83, in <module>
2025-12-04T10:35:20.8969743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.8969834Z     self._wait_futures(scope)
2025-12-04T10:35:20.8970259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.8970359Z     kernel = result.result()
2025-12-04T10:35:20.8970795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.8970929Z     return self.result_fn()
2025-12-04T10:35:20.8971454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.8971600Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.8972038Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.8972048Z 
2025-12-04T10:35:20.8972188Z Name=triton_poi_fused__to_copy_0
2025-12-04T10:35:20.8972293Z Traceback (most recent call last):
2025-12-04T10:35:20.8972648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.8972743Z     return fn(*args, **kwargs)
2025-12-04T10:35:20.8973148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.8973381Z     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.8973726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.8973818Z     return fn(*args, **kwargs)
2025-12-04T10:35:20.8974159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.8974337Z     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.8974773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.8975097Z     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.8975442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.8975674Z     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.8976088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.8976301Z     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.8976684Z ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.8976729Z 
2025-12-04T10:35:20.8976948Z The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.8976955Z 
2025-12-04T10:35:20.8977060Z Traceback (most recent call last):
2025-12-04T10:35:20.8977515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.8977603Z     result = job()
2025-12-04T10:35:20.8978106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.8978234Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.8978705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.8978799Z     self._precompile_worker()
2025-12-04T10:35:20.8979393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.8979540Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.8980053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.8980215Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.8980596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.8980814Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.8981188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.8981469Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.8981635Z triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.8981899Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.8982007Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.8982119Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.8982204Z     xmask = xindex < xnumel
2025-12-04T10:35:20.8982284Z     x0 = xindex
2025-12-04T10:35:20.8982427Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.8982524Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.8982650Z            ^
2025-12-04T10:35:20.8982982Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.8982986Z 
2025-12-04T10:35:20.8982990Z 
2025-12-04T10:35:20.8983606Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.8983614Z 
2025-12-04T10:35:20.8983618Z 
2025-12-04T10:35:20.8983801Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.8984493Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.8984539Z 
2025-12-04T10:35:20.8984766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.8985040Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8985138Z frames [('total', 1)]
2025-12-04T10:35:20.8985233Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.8985419Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.8985865Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.8985946Z graph_break []
2025-12-04T10:35:20.8986178Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.8986264Z frames [('total', 1)]
2025-12-04T10:35:20.8986357Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.8986547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.8987043Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.8987122Z graph_break []
2025-12-04T10:35:20.8987253Z =================================== FAILURES ===================================
2025-12-04T10:35:20.8987528Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.8987642Z Traceback (most recent call last):
2025-12-04T10:35:20.8987954Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.8988057Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.8988483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.8988694Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.8989133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.8989297Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.8989729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.8989856Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.8990311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.8990579Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.8991029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.8991151Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.8991565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.8991664Z     return self._compile_to_module()
2025-12-04T10:35:20.8992073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.8992260Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.8992702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.8992810Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.8993232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.8993430Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.8993939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.8994086Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.8994508Z   File "/tmp/tmp8a4fclai/ud/cud2d6u6wlihn5nlz5bqo3u4g57uq3ib3xif7lvvtazv7l5l6as7.py", line 83, in <module>
2025-12-04T10:35:20.8994902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait
2025-12-04T10:35:20.8994998Z     self._wait_futures(scope)
2025-12-04T10:35:20.8995464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures
2025-12-04T10:35:20.8995562Z     kernel = result.result()
2025-12-04T10:35:20.8995936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result
2025-12-04T10:35:20.8996073Z     return self.result_fn()
2025-12-04T10:35:20.8996478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result
2025-12-04T10:35:20.8996589Z     raise e.with_name(kernel_name) from e
2025-12-04T10:35:20.8996921Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.8996926Z 
2025-12-04T10:35:20.8997023Z Name=triton_poi_fused__to_copy_0
2025-12-04T10:35:20.8997131Z Traceback (most recent call last):
2025-12-04T10:35:20.8997482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.8997575Z     return fn(*args, **kwargs)
2025-12-04T10:35:20.8997914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.8998137Z     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.8998489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.8998581Z     return fn(*args, **kwargs)
2025-12-04T10:35:20.8998923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.8999105Z     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.8999463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.8999790Z     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9000136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9000353Z     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9000702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9000911Z     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9001292Z ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9001298Z 
2025-12-04T10:35:20.9001507Z The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9001511Z 
2025-12-04T10:35:20.9001613Z Traceback (most recent call last):
2025-12-04T10:35:20.9002125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.9002205Z     result = job()
2025-12-04T10:35:20.9002714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.9002841Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.9003314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.9003409Z     self._precompile_worker()
2025-12-04T10:35:20.9003926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9004122Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9004637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9004804Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9005181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9005432Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9005838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9006187Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9006346Z triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9006614Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9006720Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9006831Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9006917Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9006999Z     x0 = xindex
2025-12-04T10:35:20.9007143Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9007246Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9007322Z            ^
2025-12-04T10:35:20.9007654Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9007659Z 
2025-12-04T10:35:20.9007663Z 
2025-12-04T10:35:20.9008450Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9008459Z 
2025-12-04T10:35:20.9008465Z 
2025-12-04T10:35:20.9008651Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9009339Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9009344Z 
2025-12-04T10:35:20.9009568Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9009745Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9009837Z frames [('total', 1)]
2025-12-04T10:35:20.9009928Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9010120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9010522Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9010606Z graph_break []
2025-12-04T10:35:20.9010789Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9010877Z frames [('total', 1)]
2025-12-04T10:35:20.9010973Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9011163Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9011771Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.9011856Z graph_break []
2025-12-04T10:35:20.9012035Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9012114Z frames [('total', 1)]
2025-12-04T10:35:20.9012215Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9012396Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9012900Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)]
2025-12-04T10:35:20.9013046Z graph_break []
2025-12-04T10:35:20.9013605Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml -
2025-12-04T10:35:20.9013757Z =========================== short test summary info ============================
2025-12-04T10:35:20.9014570Z FAILED [0.5834s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
2025-12-04T10:35:20.9014575Z 
2025-12-04T10:35:20.9014726Z Name=triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9014839Z Traceback (most recent call last):
2025-12-04T10:35:20.9015195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9015341Z     return fn(*args, **kwargs)
2025-12-04T10:35:20.9015683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9015951Z     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9016315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9016407Z     return fn(*args, **kwargs)
2025-12-04T10:35:20.9016748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9016927Z     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9017287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9017615Z     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9017957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9018173Z     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9018516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9018725Z     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9019154Z ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9019164Z 
2025-12-04T10:35:20.9019371Z The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9019375Z 
2025-12-04T10:35:20.9019480Z Traceback (most recent call last):
2025-12-04T10:35:20.9019941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job
2025-12-04T10:35:20.9020024Z     result = job()
2025-12-04T10:35:20.9020531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
2025-12-04T10:35:20.9020657Z     kernel.precompile(warm_cache_only=True)
2025-12-04T10:35:20.9021131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile
2025-12-04T10:35:20.9021234Z     self._precompile_worker()
2025-12-04T10:35:20.9021789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9021937Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9022452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9022615Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9023004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9023211Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9023627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9023917Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9024071Z triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9024341Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9024450Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9024628Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9024723Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9024797Z     x0 = xindex
2025-12-04T10:35:20.9024940Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9025081Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9025154Z            ^
2025-12-04T10:35:20.9025485Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9025491Z 
2025-12-04T10:35:20.9025495Z 
2025-12-04T10:35:20.9026166Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9026170Z 
2025-12-04T10:35:20.9026174Z 
2025-12-04T10:35:20.9026357Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9027051Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9027056Z 
2025-12-04T10:35:20.9027278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9027436Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9027615Z ============= 1 failed, 2 passed, 57 deselected, 2 rerun in 4.43s ==============
2025-12-04T10:35:20.9027696Z Got exit code 1
2025-12-04T10:35:20.9027790Z Retrying single test...
2025-12-04T10:35:20.9028192Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml
2025-12-04T10:35:20.9028327Z ============================= test session starts ==============================
2025-12-04T10:35:20.9028628Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9028718Z cachedir: .pytest_cache
2025-12-04T10:35:20.9029165Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9029264Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9029349Z configfile: pytest.ini
2025-12-04T10:35:20.9029817Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9029998Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.9030605Z stepcurrent: skipping 59 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9030700Z Running 1 items in this shard
2025-12-04T10:35:20.9030704Z 
2025-12-04T10:35:20.9031693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9032347Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9032804Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9033319Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9033729Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9034087Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9034637Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9035080Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9035545Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9036024Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9036449Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9036912Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9037368Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9037669Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9039204Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9039667Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9040392Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9040827Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9041526Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9042129Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9042897Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9043325Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9044038Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9044575Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9045351Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9046091Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9046842Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9047432Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9048214Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9048799Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9049547Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9049851Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9050423Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9050720Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9051174Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9052054Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9052591Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9053339Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9053918Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9054659Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9055352Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9055880Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9056516Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9056978Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9057490Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9057903Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9058266Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9058799Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9059336Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9059724Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9060421Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9060529Z ('RERUN', {'yellow': True}) [1.7763s] [100%]
2025-12-04T10:35:20.9061469Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9062109Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9062563Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9063035Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9063447Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9063802Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9064301Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9064741Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9065166Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9065591Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9066013Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9066472Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9066971Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9067273Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9068801Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9069295Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9070021Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9070492Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9071190Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9071828Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9072552Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9072974Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9073691Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9074223Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9074957Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9075648Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9076360Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9076948Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9077656Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9078236Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9079026Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9079326Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9079898Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9080191Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9080646Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9081570Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9082103Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9082890Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9083463Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9084795Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9085449Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9086018Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9086661Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9087123Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9087591Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9088004Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9088362Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9088860Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9089305Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9089649Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9090345Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9090457Z ('RERUN', {'yellow': True}) [0.3294s] [100%]
2025-12-04T10:35:20.9091516Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9092162Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9092616Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9093086Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9093538Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9093894Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9094406Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9094910Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9095337Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9095767Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9096270Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9096737Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9097195Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9097496Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9099064Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9099525Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9100253Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9100678Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9101384Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9101986Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9102710Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9103184Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9103903Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9104436Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9105171Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9105905Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9106626Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9107256Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9108191Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9108843Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9109598Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9109906Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9110479Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9110771Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9111223Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9112100Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9112641Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9113395Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9113966Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9114719Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9115373Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9116007Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9116647Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9117108Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9117579Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9118048Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9118412Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9118909Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9119494Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9119837Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9120574Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9120661Z FAILED [0.3284s] [100%]
2025-12-04T10:35:20.9120666Z 
2025-12-04T10:35:20.9120782Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.9121061Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9121160Z Traceback (most recent call last):
2025-12-04T10:35:20.9121475Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9121579Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9121993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9122200Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9122636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9122795Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9123227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9123343Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9123794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9124072Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9124514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9124643Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9125052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9125152Z     return self._compile_to_module()
2025-12-04T10:35:20.9125577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9125730Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9126192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9126344Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9126762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9126959Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9127455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9127563Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9128001Z   File "/tmp/tmptdhme109/dj/cdjqtqned3bxnvaezdbponghgy2hgmx6ssmso7ah2t3uhuqwyvdf.py", line 51, in <module>
2025-12-04T10:35:20.9128434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9128526Z     kernel.precompile(
2025-12-04T10:35:20.9128994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9129091Z     self._precompile_worker()
2025-12-04T10:35:20.9129637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9129787Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9130303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9130506Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9130885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9131092Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9131460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9131746Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9131944Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9132207Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9132309Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9132419Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9132505Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9132584Z     x0 = xindex
2025-12-04T10:35:20.9132722Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9132814Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9132893Z            ^
2025-12-04T10:35:20.9133217Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9133222Z 
2025-12-04T10:35:20.9133835Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9133840Z 
2025-12-04T10:35:20.9133843Z 
2025-12-04T10:35:20.9134023Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9134709Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9134717Z 
2025-12-04T10:35:20.9134937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9135113Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9135200Z frames [('total', 1)]
2025-12-04T10:35:20.9135294Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9135713Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9135932Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9136055Z graph_break []
2025-12-04T10:35:20.9136328Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9136431Z Traceback (most recent call last):
2025-12-04T10:35:20.9136739Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9136842Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9137253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9137461Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9137967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9138123Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9138557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9138675Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9139214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9139491Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9139970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9140093Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9140500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9140595Z     return self._compile_to_module()
2025-12-04T10:35:20.9141005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9141138Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9141572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9141682Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9142099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9142298Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9142791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9142896Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9143323Z   File "/tmp/tmp81bduxbe/s7/cs7x4qn472xwe2oluebbwe3u4t7ouf26zc2w7ghsbc3fxfid6nzg.py", line 51, in <module>
2025-12-04T10:35:20.9143716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9143807Z     kernel.precompile(
2025-12-04T10:35:20.9144277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9144370Z     self._precompile_worker()
2025-12-04T10:35:20.9144877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9145030Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9145531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9145696Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9146121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9146376Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9146748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9147031Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9147225Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9147488Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9147601Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9147710Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9147838Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9147913Z     x0 = xindex
2025-12-04T10:35:20.9148051Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9148148Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9148226Z            ^
2025-12-04T10:35:20.9148553Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9148558Z 
2025-12-04T10:35:20.9149209Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9149215Z 
2025-12-04T10:35:20.9149219Z 
2025-12-04T10:35:20.9149398Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9150114Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9150125Z 
2025-12-04T10:35:20.9150349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9150529Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9150620Z frames [('total', 1)]
2025-12-04T10:35:20.9150715Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9151116Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9151310Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9151386Z graph_break []
2025-12-04T10:35:20.9151564Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9151649Z frames [('total', 1)]
2025-12-04T10:35:20.9151740Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9151927Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9152324Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9152402Z graph_break []
2025-12-04T10:35:20.9152524Z =================================== FAILURES ===================================
2025-12-04T10:35:20.9152800Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9152897Z Traceback (most recent call last):
2025-12-04T10:35:20.9153210Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9153311Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9153729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9153937Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9154373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9154538Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9154971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9155091Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9155594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9155917Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9156365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9156485Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9156888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9157034Z     return self._compile_to_module()
2025-12-04T10:35:20.9157445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9157579Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9158016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9158119Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9158582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9158773Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9159272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9159415Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9159848Z   File "/tmp/tmpjt4eng62/4e/c4eda7ev2o4trituhdtmd7aogonniqxncantpa6hwz7nvoqvx2bq.py", line 51, in <module>
2025-12-04T10:35:20.9160244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9160333Z     kernel.precompile(
2025-12-04T10:35:20.9160810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9160902Z     self._precompile_worker()
2025-12-04T10:35:20.9161405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9161552Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9162058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9162220Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9162604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9162804Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9163188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9163468Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9163663Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9163935Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9164030Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9164142Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9164228Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9164302Z     x0 = xindex
2025-12-04T10:35:20.9164437Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9164534Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9164601Z            ^
2025-12-04T10:35:20.9164930Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9164935Z 
2025-12-04T10:35:20.9165582Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9165588Z 
2025-12-04T10:35:20.9165591Z 
2025-12-04T10:35:20.9165778Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9166456Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9166464Z 
2025-12-04T10:35:20.9166683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9166908Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9166990Z frames [('total', 1)]
2025-12-04T10:35:20.9167080Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9167478Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9167658Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9167737Z graph_break []
2025-12-04T10:35:20.9167950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9168028Z frames [('total', 1)]
2025-12-04T10:35:20.9168121Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9168301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9168730Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9168813Z graph_break []
2025-12-04T10:35:20.9168988Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9169071Z frames [('total', 1)]
2025-12-04T10:35:20.9169160Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9169338Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9169729Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9169804Z graph_break []
2025-12-04T10:35:20.9170361Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml -
2025-12-04T10:35:20.9170504Z =========================== short test summary info ============================
2025-12-04T10:35:20.9171167Z FAILED [0.3284s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9171442Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9171539Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9175402Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9175510Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9175599Z     x0 = xindex
2025-12-04T10:35:20.9175746Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9175856Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9175934Z            ^
2025-12-04T10:35:20.9176284Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9176289Z 
2025-12-04T10:35:20.9176897Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9176906Z 
2025-12-04T10:35:20.9176912Z 
2025-12-04T10:35:20.9177101Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9177796Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9177802Z 
2025-12-04T10:35:20.9178120Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9178282Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9178456Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ==================
2025-12-04T10:35:20.9178537Z Got exit code 1
2025-12-04T10:35:20.9178633Z Retrying single test...
2025-12-04T10:35:20.9179095Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml
2025-12-04T10:35:20.9179245Z ============================= test session starts ==============================
2025-12-04T10:35:20.9179589Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9179679Z cachedir: .pytest_cache
2025-12-04T10:35:20.9180128Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9180235Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9180331Z configfile: pytest.ini
2025-12-04T10:35:20.9180840Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9181034Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.9181657Z stepcurrent: skipping 59 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9181794Z Running 1 items in this shard
2025-12-04T10:35:20.9181799Z 
2025-12-04T10:35:20.9182741Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9183392Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9183849Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9184323Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9184743Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9185107Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9185609Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9186050Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9186478Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9186908Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9187340Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9187801Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9188263Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9188567Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9190154Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9190618Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9191386Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9191822Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9192563Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9193161Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9193925Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9194352Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9195075Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9195613Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9196352Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9197045Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9197760Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9198360Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9199069Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9199652Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9200405Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9200713Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9201328Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9201628Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9202083Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9202964Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9203550Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9204298Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9204930Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9205677Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9206422Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9206956Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9207596Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9208218Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9208698Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9209122Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9209485Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9209988Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9210439Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9210784Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9211486Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9211596Z ('RERUN', {'yellow': True}) [1.7930s] [100%]
2025-12-04T10:35:20.9212538Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9213190Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9213726Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9214204Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9214614Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9214981Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9215543Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9216034Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9216469Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9216950Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9217381Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9217933Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9218392Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9218704Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9220276Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9220739Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9221470Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9221905Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9222606Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9223207Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9223933Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9224367Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9225087Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9225674Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9226421Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9227112Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9227873Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9228473Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9229226Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9229817Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9230606Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9230921Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9231503Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9231806Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9232266Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9233150Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9233692Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9234447Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9235031Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9235802Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9236486Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9237011Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9237695Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9238168Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9238642Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9239064Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9239426Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9239968Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9240426Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9240777Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9241523Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9241632Z ('RERUN', {'yellow': True}) [0.3347s] [100%]
2025-12-04T10:35:20.9242618Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9243270Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9243732Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9244220Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9244639Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9245012Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9245512Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9245986Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9246452Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9246887Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9247322Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9247782Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9248243Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9248566Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9250144Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9250608Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9251349Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9251832Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9252540Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9253182Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9253909Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9254387Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9255118Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9255657Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9256455Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9257158Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9257883Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9258484Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9259262Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9259872Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9260625Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9260945Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9261520Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9261876Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9262338Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9263222Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9263775Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9264576Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9265164Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9265977Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9266631Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9267199Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9267845Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9268317Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9268790Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9269213Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9269575Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9270074Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9270527Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9270876Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9271756Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9271884Z FAILED [0.3277s] [100%]
2025-12-04T10:35:20.9271891Z 
2025-12-04T10:35:20.9272056Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.9272435Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9272577Z Traceback (most recent call last):
2025-12-04T10:35:20.9272912Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9273017Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9273436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9273721Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9274162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9274328Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9274768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9274895Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9275354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9275674Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9276113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9276249Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9276655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9276800Z     return self._compile_to_module()
2025-12-04T10:35:20.9277214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9277351Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9277840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9277951Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9278369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9278572Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9279073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9279185Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9279625Z   File "/tmp/tmpz27mbrio/fl/cfldzd5p5gcnn4h4wlqf2xbzc7442kpbkzeklebfxclchaxpg74g.py", line 51, in <module>
2025-12-04T10:35:20.9280020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9280119Z     kernel.precompile(
2025-12-04T10:35:20.9280598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9280707Z     self._precompile_worker()
2025-12-04T10:35:20.9281212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9281360Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9281875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9282046Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9282428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9282637Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9283009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9283300Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9283497Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9283761Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9283877Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9284040Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9284126Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9284207Z     x0 = xindex
2025-12-04T10:35:20.9284347Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9284450Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9284523Z            ^
2025-12-04T10:35:20.9284849Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9284857Z 
2025-12-04T10:35:20.9285570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9285620Z 
2025-12-04T10:35:20.9285624Z 
2025-12-04T10:35:20.9285812Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9286504Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9286509Z 
2025-12-04T10:35:20.9286731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9286955Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9287042Z frames [('total', 1)]
2025-12-04T10:35:20.9287138Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9287586Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9287772Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9287853Z graph_break []
2025-12-04T10:35:20.9288131Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9288235Z Traceback (most recent call last):
2025-12-04T10:35:20.9288546Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9288655Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9289071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9289291Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9289726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9289889Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9290331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9290449Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9290902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9291183Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9291622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9291751Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9292158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9292260Z     return self._compile_to_module()
2025-12-04T10:35:20.9292673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9292806Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9293252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9293359Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9293825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9294028Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9294524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9294635Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9295061Z   File "/tmp/tmp6qpep_xi/ck/cckdvde4aeuiapbenhrjxpbdqh3fxkid3huwmrfblucw5ztfub3w.py", line 51, in <module>
2025-12-04T10:35:20.9295459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9295677Z     kernel.precompile(
2025-12-04T10:35:20.9296152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9296245Z     self._precompile_worker()
2025-12-04T10:35:20.9296766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9296912Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9297467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9297635Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9298056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9298271Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9298650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9298946Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9299203Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9299480Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9299591Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9299706Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9299792Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9299872Z     x0 = xindex
2025-12-04T10:35:20.9300017Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9300115Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9300193Z            ^
2025-12-04T10:35:20.9300519Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9300527Z 
2025-12-04T10:35:20.9301145Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9301149Z 
2025-12-04T10:35:20.9301153Z 
2025-12-04T10:35:20.9301341Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9302029Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9302034Z 
2025-12-04T10:35:20.9302262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9302443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9302535Z frames [('total', 1)]
2025-12-04T10:35:20.9302638Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9303035Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9303227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9303309Z graph_break []
2025-12-04T10:35:20.9303550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9303637Z frames [('total', 1)]
2025-12-04T10:35:20.9303730Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9303922Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9304317Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9304400Z graph_break []
2025-12-04T10:35:20.9304528Z =================================== FAILURES ===================================
2025-12-04T10:35:20.9304804Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9304983Z Traceback (most recent call last):
2025-12-04T10:35:20.9305301Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9305406Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9305833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9306040Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9306519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9306683Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9307151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9307279Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9307906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9308183Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9308635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9308757Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9309181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9309285Z     return self._compile_to_module()
2025-12-04T10:35:20.9309706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9309859Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9310301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9310414Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9310846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9311053Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9311576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9311686Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9312116Z   File "/tmp/tmpgl4yqbld/w7/cw7v5yat3t2pzgsrj42o4p7sqcxta2w4ftba72uhnfptyzjfzn5d.py", line 51, in <module>
2025-12-04T10:35:20.9312525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9312620Z     kernel.precompile(
2025-12-04T10:35:20.9313105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9313203Z     self._precompile_worker()
2025-12-04T10:35:20.9313711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9313949Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9314462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9314627Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9315013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9315221Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9315617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9315961Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9316164Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9316442Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9316542Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9316667Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9316814Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9316894Z     x0 = xindex
2025-12-04T10:35:20.9317041Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9317139Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9317267Z            ^
2025-12-04T10:35:20.9317605Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9317610Z 
2025-12-04T10:35:20.9318225Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9318230Z 
2025-12-04T10:35:20.9318234Z 
2025-12-04T10:35:20.9318420Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9319114Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9319119Z 
2025-12-04T10:35:20.9319353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9319544Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9319632Z frames [('total', 1)]
2025-12-04T10:35:20.9319738Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9320130Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9320324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9320409Z graph_break []
2025-12-04T10:35:20.9320584Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9320668Z frames [('total', 1)]
2025-12-04T10:35:20.9320776Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9320956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9321354Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9321440Z graph_break []
2025-12-04T10:35:20.9321614Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9321702Z frames [('total', 1)]
2025-12-04T10:35:20.9321793Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9321972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9322366Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9322446Z graph_break []
2025-12-04T10:35:20.9323003Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml -
2025-12-04T10:35:20.9323193Z =========================== short test summary info ============================
2025-12-04T10:35:20.9323864Z FAILED [0.3277s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9324139Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9324245Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9324355Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9324443Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9324561Z     x0 = xindex
2025-12-04T10:35:20.9324704Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9324800Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9324870Z            ^
2025-12-04T10:35:20.9325204Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9325211Z 
2025-12-04T10:35:20.9325852Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9325857Z 
2025-12-04T10:35:20.9325861Z 
2025-12-04T10:35:20.9326041Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9326717Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9326760Z 
2025-12-04T10:35:20.9326990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9327143Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9327308Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ==================
2025-12-04T10:35:20.9327392Z Got exit code 1
2025-12-04T10:35:20.9327861Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9328213Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.9328620Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml
2025-12-04T10:35:20.9328758Z ============================= test session starts ==============================
2025-12-04T10:35:20.9329046Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9329142Z cachedir: .pytest_cache
2025-12-04T10:35:20.9329584Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9329687Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9329772Z configfile: pytest.ini
2025-12-04T10:35:20.9330230Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9330424Z collecting ... collected 188 items / 60 deselected / 128 selected
2025-12-04T10:35:20.9330540Z stepcurrent: skipping 60 already run items.
2025-12-04T10:35:20.9330630Z Running 128 items in this shard
2025-12-04T10:35:20.9330638Z 
2025-12-04T10:35:20.9331602Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9332245Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9332751Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9333223Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9333641Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9333999Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9334500Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9334983Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9335404Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9335839Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9336351Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9336810Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9337314Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9337613Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9339209Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9339661Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9340392Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9340818Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9341522Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9342121Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9342837Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9343265Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9343981Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9344593Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9345326Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9346020Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9346728Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9347356Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9348067Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9348683Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9349435Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9349771Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9350347Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9350640Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9351088Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9351971Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9352503Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9353257Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9353827Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9354573Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9355221Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9355740Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9356382Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9356881Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9357362Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9357774Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9358130Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9358635Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9359114Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9359459Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9360153Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9360302Z ('RERUN', {'yellow': True}) [1.7902s] [  0%]
2025-12-04T10:35:20.9361259Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9361938Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9362397Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9362865Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9363281Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9363638Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9364137Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9364578Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9365000Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9365433Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9365932Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9366448Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9366908Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9367205Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9368786Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9369239Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9369965Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9370388Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9371129Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9371728Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9372484Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9372947Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9373658Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9374198Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9374927Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9375620Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9376379Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9376964Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9377677Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9378255Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9379008Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9379350Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9379928Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9380220Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9380714Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9381603Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9382130Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9382880Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9383494Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9384241Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9384938Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9385519Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9386212Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9386666Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9387141Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9387555Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9387910Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9388412Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9388851Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9389196Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9389896Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9390002Z ('RERUN', {'yellow': True}) [0.3295s] [  0%]
2025-12-04T10:35:20.9390965Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9391602Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9392063Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9392572Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9392988Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9393347Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9393843Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9394284Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9394750Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9395180Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9395605Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9396151Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9396609Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9396942Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9398479Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9398932Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9399657Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9400081Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9400787Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9401384Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9402101Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9402524Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9403233Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9403771Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9404544Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9405237Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9405946Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9406533Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9407287Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9408035Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9408856Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9409230Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9409847Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9410165Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9410645Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9411596Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9412163Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9412978Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9413592Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9414395Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9415095Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9415651Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9416341Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9416830Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9417401Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9417818Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9418173Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9418672Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9419156Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9419561Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9420253Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9420335Z FAILED [0.3267s] [  0%]
2025-12-04T10:35:20.9420344Z 
2025-12-04T10:35:20.9420500Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.9420783Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9420886Z Traceback (most recent call last):
2025-12-04T10:35:20.9421233Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9421339Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9421761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9421967Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9422403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9422562Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9422994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9423112Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9423562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9423833Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9424280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9424401Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9424808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9424905Z     return self._compile_to_module()
2025-12-04T10:35:20.9425315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9425454Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9425889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9426000Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9426415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9426612Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9427108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9427207Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9427686Z   File "/tmp/tmps26qny1n/ad/cadav5mroc2gis34vnfqicpsaedrnx7sybmi2gcwceto4d5kskxj.py", line 51, in <module>
2025-12-04T10:35:20.9428085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9428171Z     kernel.precompile(
2025-12-04T10:35:20.9428643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9428737Z     self._precompile_worker()
2025-12-04T10:35:20.9429240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9429430Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9429931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9430096Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9430478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9430745Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9431118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9431397Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9431634Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9431900Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9432001Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9432119Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9432203Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9432275Z     x0 = xindex
2025-12-04T10:35:20.9432414Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9432510Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9432581Z            ^
2025-12-04T10:35:20.9432911Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9432916Z 
2025-12-04T10:35:20.9433523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9433531Z 
2025-12-04T10:35:20.9433535Z 
2025-12-04T10:35:20.9433715Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9434407Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9434412Z 
2025-12-04T10:35:20.9434640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9434819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9434902Z frames [('total', 1)]
2025-12-04T10:35:20.9434995Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9435397Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9435579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9435661Z graph_break []
2025-12-04T10:35:20.9435977Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9436089Z Traceback (most recent call last):
2025-12-04T10:35:20.9436397Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9436497Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9436909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9437159Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9437596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9437759Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9438185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9438312Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9438762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9439077Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9439520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9439641Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9440046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9440182Z     return self._compile_to_module()
2025-12-04T10:35:20.9440595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9440769Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9441203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9441311Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9441730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9441920Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9442420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9442526Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9442947Z   File "/tmp/tmpm6h_b101/xc/cxctxutdvenp3i3aoeg7ligi7mcyhn4myizgftgokhg7dgz6oocf.py", line 51, in <module>
2025-12-04T10:35:20.9443339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9443431Z     kernel.precompile(
2025-12-04T10:35:20.9443908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9444004Z     self._precompile_worker()
2025-12-04T10:35:20.9444506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9444653Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9445157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9445323Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9445706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9445909Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9446333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9446615Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9446806Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9447070Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9447214Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9447334Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9447418Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9447491Z     x0 = xindex
2025-12-04T10:35:20.9447632Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9447726Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9447796Z            ^
2025-12-04T10:35:20.9448122Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9448128Z 
2025-12-04T10:35:20.9448730Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9448779Z 
2025-12-04T10:35:20.9448783Z 
2025-12-04T10:35:20.9448966Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9449657Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9449662Z 
2025-12-04T10:35:20.9449922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9450103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9450183Z frames [('total', 1)]
2025-12-04T10:35:20.9450324Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9450722Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9450908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9450987Z graph_break []
2025-12-04T10:35:20.9451160Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9451240Z frames [('total', 1)]
2025-12-04T10:35:20.9451340Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9451520Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9451914Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9451994Z graph_break []
2025-12-04T10:35:20.9452110Z =================================== FAILURES ===================================
2025-12-04T10:35:20.9452397Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9452497Z Traceback (most recent call last):
2025-12-04T10:35:20.9452805Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9452913Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9453328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9453537Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9453974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9454130Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9454564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9454679Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9455128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9455401Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9455837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9455958Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9456409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9456506Z     return self._compile_to_module()
2025-12-04T10:35:20.9456917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9457048Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9457484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9457590Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9458050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9458241Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9458734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9458840Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9459316Z   File "/tmp/tmpif5dtou7/lx/clx56sov4ibdgkhsbhibfga667qsrwikk7i4vjlko3qqsg4u4sje.py", line 51, in <module>
2025-12-04T10:35:20.9459751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9459842Z     kernel.precompile(
2025-12-04T10:35:20.9460350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9460440Z     self._precompile_worker()
2025-12-04T10:35:20.9460955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9461100Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9461607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9461769Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9462147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9462353Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9462725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9463010Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9463204Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9463466Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9463565Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9463674Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9463759Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9463839Z     x0 = xindex
2025-12-04T10:35:20.9463971Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9464064Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9467831Z            ^
2025-12-04T10:35:20.9468175Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9468180Z 
2025-12-04T10:35:20.9468806Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9468814Z 
2025-12-04T10:35:20.9468818Z 
2025-12-04T10:35:20.9469000Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9469700Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9469706Z 
2025-12-04T10:35:20.9470076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9470260Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9470354Z frames [('total', 1)]
2025-12-04T10:35:20.9470451Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9470851Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9471050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9471134Z graph_break []
2025-12-04T10:35:20.9471392Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9471480Z frames [('total', 1)]
2025-12-04T10:35:20.9471574Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9471762Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9472161Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9472241Z graph_break []
2025-12-04T10:35:20.9472463Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9472551Z frames [('total', 1)]
2025-12-04T10:35:20.9472649Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9472834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9473266Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9473359Z graph_break []
2025-12-04T10:35:20.9473917Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml -
2025-12-04T10:35:20.9474059Z =========================== short test summary info ============================
2025-12-04T10:35:20.9474756Z FAILED [0.3267s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9475027Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9475139Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9475256Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9475345Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9475435Z     x0 = xindex
2025-12-04T10:35:20.9475578Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9475676Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9475782Z            ^
2025-12-04T10:35:20.9476141Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9476146Z 
2025-12-04T10:35:20.9476761Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9476766Z 
2025-12-04T10:35:20.9476770Z 
2025-12-04T10:35:20.9476952Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9477648Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9477657Z 
2025-12-04T10:35:20.9477883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9478037Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9478213Z ================== 1 failed, 60 deselected, 2 rerun in 2.48s ===================
2025-12-04T10:35:20.9478293Z Got exit code 1
2025-12-04T10:35:20.9478380Z Retrying single test...
2025-12-04T10:35:20.9478795Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml
2025-12-04T10:35:20.9478973Z ============================= test session starts ==============================
2025-12-04T10:35:20.9479270Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9479360Z cachedir: .pytest_cache
2025-12-04T10:35:20.9479807Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9479915Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9480000Z configfile: pytest.ini
2025-12-04T10:35:20.9480466Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9480698Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.9481317Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9481418Z Running 1 items in this shard
2025-12-04T10:35:20.9481423Z 
2025-12-04T10:35:20.9482430Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9483083Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9483587Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9484066Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9484493Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9484852Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9485361Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9485830Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9486280Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9486716Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9487141Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9487610Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9488074Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9488371Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9489917Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9490415Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9491156Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9491582Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9492295Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9492936Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9493663Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9494122Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9494837Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9495412Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9496197Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9496896Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9497608Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9498205Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9498914Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9499555Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9500317Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9500620Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9501202Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9501503Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9501948Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9502889Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9503423Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9504187Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9504766Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9505559Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9506261Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9506829Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9507467Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9508131Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9508612Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9509031Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9509399Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9509896Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9510347Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9510702Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9511405Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9511525Z ('RERUN', {'yellow': True}) [1.7874s] [100%]
2025-12-04T10:35:20.9512492Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9513131Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9513604Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9514077Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9514496Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9514964Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9515471Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9515914Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9516347Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9516842Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9517265Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9517730Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9518247Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9518551Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9520094Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9520607Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9521341Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9521764Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9522480Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9523077Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9523807Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9524232Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9524945Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9525498Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9526279Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9527029Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9527745Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9528347Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9529064Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9529689Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9530444Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9530781Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9531364Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9531719Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9532167Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9533064Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9533601Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9534354Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9534929Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9535679Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9536336Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9536867Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9537507Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9537966Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9538442Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9538861Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9539312Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9539856Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9540328Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9540707Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9541496Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9541625Z ('RERUN', {'yellow': True}) [0.3343s] [100%]
2025-12-04T10:35:20.9542655Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9543378Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9543877Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9544346Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9544777Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9545136Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9545637Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9546129Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9546552Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9546992Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9547415Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9547884Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9548343Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9548647Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9550188Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9550642Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9551418Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9551849Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9552559Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9553158Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9553928Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9554358Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9555108Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9555718Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9556453Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9557154Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9557867Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9558470Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9559186Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9559770Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9560530Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9560830Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9561408Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9561711Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9562158Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9563048Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9563626Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9564382Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9564960Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9565753Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9566406Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9566972Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9567730Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9568393Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9568968Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9569383Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9569749Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9570249Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9570687Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9571038Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9571731Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9571822Z FAILED [0.3293s] [100%]
2025-12-04T10:35:20.9571827Z 
2025-12-04T10:35:20.9571945Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.9572233Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9572339Z Traceback (most recent call last):
2025-12-04T10:35:20.9572654Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9572763Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9573174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9573387Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9573826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9573988Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9574420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9574601Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9575052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9575331Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9575771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9575897Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9576308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9576450Z     return self._compile_to_module()
2025-12-04T10:35:20.9576864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9576999Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9577439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9577549Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9578009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9578203Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9578749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9578851Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9579357Z   File "/tmp/tmph16ogg6w/qk/cqkrfvgghjdw2oxnsqjczq6z4hpx5fcyqydtg3tsmk4ty2b7phyh.py", line 51, in <module>
2025-12-04T10:35:20.9579749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9579836Z     kernel.precompile(
2025-12-04T10:35:20.9580312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9580407Z     self._precompile_worker()
2025-12-04T10:35:20.9580923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9581070Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9581577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9581840Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9582219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9582425Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9582806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9583087Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9583293Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9583556Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9583659Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9583775Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9583860Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9583936Z     x0 = xindex
2025-12-04T10:35:20.9584075Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9584173Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9584250Z            ^
2025-12-04T10:35:20.9584580Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9584585Z 
2025-12-04T10:35:20.9585242Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9585250Z 
2025-12-04T10:35:20.9585254Z 
2025-12-04T10:35:20.9585437Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9586127Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9586135Z 
2025-12-04T10:35:20.9586365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9586592Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9586676Z frames [('total', 1)]
2025-12-04T10:35:20.9586772Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9587179Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9587367Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9587444Z graph_break []
2025-12-04T10:35:20.9587768Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9587876Z Traceback (most recent call last):
2025-12-04T10:35:20.9588191Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9588333Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9588754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9588964Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9589406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9589567Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9589997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9590123Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9590579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9590858Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9591296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9591419Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9591836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9591934Z     return self._compile_to_module()
2025-12-04T10:35:20.9592350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9592491Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9592927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9593045Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9593470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9593661Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9594162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9594264Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9594757Z   File "/tmp/tmpxjl3qbd0/pz/cpzjikpmr2b667q33ht6zd63esxhsqxbwgwz32qrsuxi6ypmlhuo.py", line 51, in <module>
2025-12-04T10:35:20.9595151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9595240Z     kernel.precompile(
2025-12-04T10:35:20.9595720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9595818Z     self._precompile_worker()
2025-12-04T10:35:20.9596323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9596518Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9597024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9597192Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9597571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9597772Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9598226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9598507Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9598748Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9599014Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9599118Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9599234Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9599323Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9599396Z     x0 = xindex
2025-12-04T10:35:20.9599540Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9599637Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9599708Z            ^
2025-12-04T10:35:20.9600044Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9600051Z 
2025-12-04T10:35:20.9600656Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9600663Z 
2025-12-04T10:35:20.9600667Z 
2025-12-04T10:35:20.9600859Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9601550Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9601557Z 
2025-12-04T10:35:20.9601791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9601971Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9602053Z frames [('total', 1)]
2025-12-04T10:35:20.9602153Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9602551Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9602734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9602821Z graph_break []
2025-12-04T10:35:20.9602996Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9603084Z frames [('total', 1)]
2025-12-04T10:35:20.9603180Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9603362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9603760Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9603839Z graph_break []
2025-12-04T10:35:20.9604002Z =================================== FAILURES ===================================
2025-12-04T10:35:20.9604294Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9604399Z Traceback (most recent call last):
2025-12-04T10:35:20.9604714Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9604817Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9605231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9605444Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9605964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9606131Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9606568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9606686Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9607185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9607456Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9608094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9608222Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9608636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9608741Z     return self._compile_to_module()
2025-12-04T10:35:20.9609151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9609285Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9609728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9609835Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9610258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9610468Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9610970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9611083Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9611515Z   File "/tmp/tmpvuuppred/c5/cc5cgexulgv3lppnv2u6q5gbbeex5lylepyk2nin73lxcc6xd22t.py", line 51, in <module>
2025-12-04T10:35:20.9611913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9612013Z     kernel.precompile(
2025-12-04T10:35:20.9612490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9612594Z     self._precompile_worker()
2025-12-04T10:35:20.9613107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9613261Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9613773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9613938Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9614314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9614596Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9614967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9615250Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9615445Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9615713Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9615816Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9615934Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9616130Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9616223Z     x0 = xindex
2025-12-04T10:35:20.9616361Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9616455Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9616527Z            ^
2025-12-04T10:35:20.9616855Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9616859Z 
2025-12-04T10:35:20.9617522Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9617527Z 
2025-12-04T10:35:20.9617531Z 
2025-12-04T10:35:20.9617712Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9618469Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9618476Z 
2025-12-04T10:35:20.9618697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9618872Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9618959Z frames [('total', 1)]
2025-12-04T10:35:20.9619099Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9619499Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9619685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9619760Z graph_break []
2025-12-04T10:35:20.9619939Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9620022Z frames [('total', 1)]
2025-12-04T10:35:20.9620110Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9620295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9620688Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9620766Z graph_break []
2025-12-04T10:35:20.9620940Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9621017Z frames [('total', 1)]
2025-12-04T10:35:20.9621113Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9621296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9621684Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9621762Z graph_break []
2025-12-04T10:35:20.9622320Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml -
2025-12-04T10:35:20.9622468Z =========================== short test summary info ============================
2025-12-04T10:35:20.9623154Z FAILED [0.3293s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9623426Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9623527Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9623686Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9623776Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9623853Z     x0 = xindex
2025-12-04T10:35:20.9623990Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9624086Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9624157Z            ^
2025-12-04T10:35:20.9624480Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9624488Z 
2025-12-04T10:35:20.9625091Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9625139Z 
2025-12-04T10:35:20.9625142Z 
2025-12-04T10:35:20.9625321Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9626075Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9626080Z 
2025-12-04T10:35:20.9626344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9626494Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9626660Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ==================
2025-12-04T10:35:20.9626779Z Got exit code 1
2025-12-04T10:35:20.9626866Z Retrying single test...
2025-12-04T10:35:20.9627263Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml
2025-12-04T10:35:20.9627398Z ============================= test session starts ==============================
2025-12-04T10:35:20.9627690Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9627777Z cachedir: .pytest_cache
2025-12-04T10:35:20.9628222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9628327Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9628414Z configfile: pytest.ini
2025-12-04T10:35:20.9628882Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9629068Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.9629684Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9629778Z Running 1 items in this shard
2025-12-04T10:35:20.9629782Z 
2025-12-04T10:35:20.9630743Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9631390Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9631846Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9632318Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9632740Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9633096Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9633639Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9634080Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9634513Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9634936Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9635357Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9635884Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9636342Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9636643Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9638216Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9638797Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9639528Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9639951Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9640654Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9641256Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9641978Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9642401Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9643117Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9643650Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9644382Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9645076Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9645825Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9646419Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9647128Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9647713Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9648505Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9648803Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9649415Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9649711Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9650211Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9651095Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9651629Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9652382Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9652951Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9653696Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9654348Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9654868Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9655506Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9655959Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9656434Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9656847Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9657217Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9657768Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9658214Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9658557Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9659350Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9659506Z ('RERUN', {'yellow': True}) [1.7700s] [100%]
2025-12-04T10:35:20.9660465Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9661107Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9661606Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9662075Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9662531Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9662890Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9663394Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9663834Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9664257Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9664689Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9665109Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9665569Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9666024Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9666326Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9667858Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9668316Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9669038Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9669506Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9670216Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9670809Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9671530Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9671994Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9672704Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9673288Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9674017Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9674755Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9675465Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9676104Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9676814Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9677393Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9678147Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9678443Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9679020Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9679318Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9679766Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9680647Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9681176Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9682070Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9682646Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9683396Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9684095Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9684617Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9685293Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9685747Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9686273Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9686723Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9687083Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9687579Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9688022Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9688364Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9689055Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9689166Z ('RERUN', {'yellow': True}) [0.3301s] [100%]
2025-12-04T10:35:20.9690126Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9690762Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9691218Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9691683Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9692107Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9692462Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9692957Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9693439Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9693863Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9694294Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9694714Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9695175Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9695675Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9695974Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9697551Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9698038Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9698765Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9699235Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9699942Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9700535Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9701258Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9701681Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9702396Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9702936Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9703668Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9704362Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9705079Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9705720Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9706432Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9707017Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9707946Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9708245Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9708823Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9709184Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9709632Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9710567Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9711097Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9711854Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9712424Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9713167Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9713818Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9714340Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9714976Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9715431Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9715952Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9716365Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9716730Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9717225Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9717728Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9718080Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9718772Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9718862Z FAILED [0.3281s] [100%]
2025-12-04T10:35:20.9718867Z 
2025-12-04T10:35:20.9718983Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.9719327Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9719427Z Traceback (most recent call last):
2025-12-04T10:35:20.9719734Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9719839Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9720254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9720503Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9721007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9721324Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9721888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9722049Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9722553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9722830Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9723276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9723404Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9723808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9723906Z     return self._compile_to_module()
2025-12-04T10:35:20.9724317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9724449Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9724885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9724991Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9725408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9725603Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9726152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9726256Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9726694Z   File "/tmp/tmpxotdeeut/wa/cwas4k5bmikkdvpmygvybq3wo6qu6hftgs6fwgnnnlpq7rkgjhxv.py", line 51, in <module>
2025-12-04T10:35:20.9727084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9727181Z     kernel.precompile(
2025-12-04T10:35:20.9727651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9727744Z     self._precompile_worker()
2025-12-04T10:35:20.9728312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9728459Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9728963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9729129Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9729510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9729715Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9730132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9730416Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9730611Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9730883Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9730986Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9731137Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9731223Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9731303Z     x0 = xindex
2025-12-04T10:35:20.9731438Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9731578Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9731654Z            ^
2025-12-04T10:35:20.9731978Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9731986Z 
2025-12-04T10:35:20.9732593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9732603Z 
2025-12-04T10:35:20.9732606Z 
2025-12-04T10:35:20.9732787Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9733479Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9733484Z 
2025-12-04T10:35:20.9733711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9733889Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9733982Z frames [('total', 1)]
2025-12-04T10:35:20.9734075Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9734475Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9734666Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9734748Z graph_break []
2025-12-04T10:35:20.9735033Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9735134Z Traceback (most recent call last):
2025-12-04T10:35:20.9735442Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9735547Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9735959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9736167Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9736611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9736769Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9737197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9737315Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9737813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9738095Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9738532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9738654Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9739132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9739276Z     return self._compile_to_module()
2025-12-04T10:35:20.9739687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9739820Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9740255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9740364Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9740839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9741031Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9741531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9741675Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9742112Z   File "/tmp/tmphrmls84p/qy/cqyxrggtsu3ukvo3bajxykjun7e27sru4bfxaaako652gqnsbq3k.py", line 51, in <module>
2025-12-04T10:35:20.9742502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9742590Z     kernel.precompile(
2025-12-04T10:35:20.9743061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9743157Z     self._precompile_worker()
2025-12-04T10:35:20.9743663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9743807Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9744311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9744476Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9744853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9745057Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9745431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9745710Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9745911Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9746176Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9746275Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9746390Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9746475Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9746548Z     x0 = xindex
2025-12-04T10:35:20.9746692Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9746786Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9746856Z            ^
2025-12-04T10:35:20.9747181Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9747186Z 
2025-12-04T10:35:20.9747836Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9747841Z 
2025-12-04T10:35:20.9747850Z 
2025-12-04T10:35:20.9748034Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9748725Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9748733Z 
2025-12-04T10:35:20.9748956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9749176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9749260Z frames [('total', 1)]
2025-12-04T10:35:20.9749355Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9749754Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9749948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9750024Z graph_break []
2025-12-04T10:35:20.9750237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9750321Z frames [('total', 1)]
2025-12-04T10:35:20.9750411Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9750588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9751021Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9751103Z graph_break []
2025-12-04T10:35:20.9751220Z =================================== FAILURES ===================================
2025-12-04T10:35:20.9751501Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _
2025-12-04T10:35:20.9751599Z Traceback (most recent call last):
2025-12-04T10:35:20.9751913Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9752013Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9752422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9752631Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9753061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9753226Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9753655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9753773Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9754224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9754501Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9754942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9755060Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9755467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9755570Z     return self._compile_to_module()
2025-12-04T10:35:20.9759718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9759882Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9760334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9760441Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9760935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9761134Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9761635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9761748Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9762184Z   File "/tmp/tmppq3rutt5/x3/cx3pnwl6zyhlokbor4oxc2kmiejg3so3sdw4yry6dg4h76h56h5p.py", line 51, in <module>
2025-12-04T10:35:20.9762634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9762726Z     kernel.precompile(
2025-12-04T10:35:20.9763196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9763300Z     self._precompile_worker()
2025-12-04T10:35:20.9763809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9764028Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9764537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9764745Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9765130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9765336Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9765705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9765992Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9766192Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9766465Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9766563Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9766676Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9766771Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9766847Z     x0 = xindex
2025-12-04T10:35:20.9766980Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9767082Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9767154Z            ^
2025-12-04T10:35:20.9767481Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9767486Z 
2025-12-04T10:35:20.9768096Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9768104Z 
2025-12-04T10:35:20.9768108Z 
2025-12-04T10:35:20.9768292Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9769001Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9769006Z 
2025-12-04T10:35:20.9769231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9769413Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9769495Z frames [('total', 1)]
2025-12-04T10:35:20.9769588Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9769994Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9770183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9770308Z graph_break []
2025-12-04T10:35:20.9770490Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9770573Z frames [('total', 1)]
2025-12-04T10:35:20.9770675Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9770857Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9771250Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9771334Z graph_break []
2025-12-04T10:35:20.9771511Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9771638Z frames [('total', 1)]
2025-12-04T10:35:20.9771736Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9771915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9772315Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9772405Z graph_break []
2025-12-04T10:35:20.9772961Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml -
2025-12-04T10:35:20.9773153Z =========================== short test summary info ============================
2025-12-04T10:35:20.9773842Z FAILED [0.3281s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9774148Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9774254Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9774367Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9774457Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9774533Z     x0 = xindex
2025-12-04T10:35:20.9774670Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9774774Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9774845Z            ^
2025-12-04T10:35:20.9775169Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9775177Z 
2025-12-04T10:35:20.9775783Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9775791Z 
2025-12-04T10:35:20.9775794Z 
2025-12-04T10:35:20.9775972Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9776668Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9776675Z 
2025-12-04T10:35:20.9776897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9777054Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9777218Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ==================
2025-12-04T10:35:20.9777295Z Got exit code 1
2025-12-04T10:35:20.9777784Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16
2025-12-04T10:35:20.9778134Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:20.9778535Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml
2025-12-04T10:35:20.9778677Z ============================= test session starts ==============================
2025-12-04T10:35:20.9778965Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9779120Z cachedir: .pytest_cache
2025-12-04T10:35:20.9779616Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9779716Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9779807Z configfile: pytest.ini
2025-12-04T10:35:20.9780273Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9780471Z collecting ... collected 188 items / 61 deselected / 127 selected
2025-12-04T10:35:20.9780590Z stepcurrent: skipping 61 already run items.
2025-12-04T10:35:20.9780680Z Running 127 items in this shard
2025-12-04T10:35:20.9780685Z 
2025-12-04T10:35:20.9781622Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9782309Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9782809Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9783283Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9783735Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9784103Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9784610Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9785055Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9785487Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9785915Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9786342Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9786802Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9787267Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9787567Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9789115Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9789571Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9790301Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9790775Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9791479Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9792084Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9792803Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9793279Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9793989Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9794566Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9795304Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9796081Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9796809Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9797402Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9798120Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9798697Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9799452Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9799762Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9800340Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9800646Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9801092Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9801978Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9802510Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9803300Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9803882Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9804624Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9805291Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9805877Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9806527Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9807024Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9807495Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9808129Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9808495Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9809010Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9809452Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9809798Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9810494Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9810601Z ('RERUN', {'yellow': True}) [1.7889s] [  0%]
2025-12-04T10:35:20.9811532Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9812169Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9812639Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9813110Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9813524Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9813888Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9814389Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9814841Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9815343Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9815797Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9816250Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9816712Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9817179Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9817532Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9819249Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9819759Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9820483Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9820917Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9821620Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9822222Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9822940Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9823381Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9824090Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9824630Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9825371Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9826062Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9826778Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9827406Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9828126Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9828701Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9829449Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9829794Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9830370Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9830668Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9831155Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9832046Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9832614Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9833363Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9833947Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9834691Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9835350Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9835889Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9836567Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9837022Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9837497Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9837928Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9838290Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9838806Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9839246Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9839636Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9840350Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9840463Z ('RERUN', {'yellow': True}) [0.3298s] [  0%]
2025-12-04T10:35:20.9841396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9842079Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9842548Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9843068Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9843480Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9843852Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9844391Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9844843Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9845268Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9845702Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9846162Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9846644Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9847109Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9847406Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9848945Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9849399Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9850124Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9850562Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9851330Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9851937Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9852653Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9853083Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9853834Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9854372Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9855149Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9855849Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9856767Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9857538Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9858284Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9858866Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9859667Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9859987Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9860565Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9860868Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9861315Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9862199Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9862730Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9863480Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9864121Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9864873Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9865528Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9866048Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9866738Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9867195Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9867703Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9868123Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9868523Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9869027Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9869469Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9869813Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9870607Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9870697Z FAILED [0.3271s] [  0%]
2025-12-04T10:35:20.9870702Z 
2025-12-04T10:35:20.9870826Z ==================================== RERUNS ====================================
2025-12-04T10:35:20.9871094Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:20.9871199Z Traceback (most recent call last):
2025-12-04T10:35:20.9871517Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9871625Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9872043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9872255Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9872694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9872859Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9873291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9873415Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9873874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9874150Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9874597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9874769Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9875179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9875287Z     return self._compile_to_module()
2025-12-04T10:35:20.9875699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9875837Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9876278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9876428Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9876847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9877043Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9877540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9877649Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9878121Z   File "/tmp/tmp6bdnmq07/i2/ci24zwedewtteulurulj2yqpzur36uxl7jsfe23vzvkjvdvmqmz5.py", line 51, in <module>
2025-12-04T10:35:20.9878521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9878650Z     kernel.precompile(
2025-12-04T10:35:20.9879119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9879225Z     self._precompile_worker()
2025-12-04T10:35:20.9879735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9879892Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9880398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9880565Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9880958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9881161Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9881540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9881833Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9882035Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9882307Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9882410Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9882529Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9882627Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9882704Z     x0 = xindex
2025-12-04T10:35:20.9882847Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9882948Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9883020Z            ^
2025-12-04T10:35:20.9883357Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9883365Z 
2025-12-04T10:35:20.9883973Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9883979Z 
2025-12-04T10:35:20.9883983Z 
2025-12-04T10:35:20.9884167Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9884894Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:20.9884900Z 
2025-12-04T10:35:20.9885128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9885322Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9885403Z frames [('total', 1)]
2025-12-04T10:35:20.9885498Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9885903Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9886098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9886228Z graph_break []
2025-12-04T10:35:20.9886492Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:20.9886594Z Traceback (most recent call last):
2025-12-04T10:35:20.9886911Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9887020Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9887434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9887688Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9888128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9888364Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9888797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9888923Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9889377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9889653Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9890105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9890227Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9890630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9890743Z     return self._compile_to_module()
2025-12-04T10:35:20.9891152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9891289Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9891732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9891836Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9892266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9892460Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9892958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9893071Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9893503Z   File "/tmp/tmpfwefhtvo/le/cleqsmrvkhej5ymfpal7rs462idmf4ikyw24x6hg226j3bk5u7iz.py", line 51, in <module>
2025-12-04T10:35:20.9893907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9894001Z     kernel.precompile(
2025-12-04T10:35:20.9894470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9894571Z     self._precompile_worker()
2025-12-04T10:35:20.9895122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9895272Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9895783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9895947Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9896334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9896539Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9896953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9897242Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9897440Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9897712Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9897814Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9897966Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9898065Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9898141Z     x0 = xindex
2025-12-04T10:35:20.9898277Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9898420Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9898494Z            ^
2025-12-04T10:35:20.9898823Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9898830Z 
2025-12-04T10:35:20.9899489Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9899493Z 
2025-12-04T10:35:20.9899497Z 
2025-12-04T10:35:20.9899679Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9900360Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:20.9900365Z 
2025-12-04T10:35:20.9900587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9900777Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9900860Z frames [('total', 1)]
2025-12-04T10:35:20.9900955Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9901366Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9901555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9901632Z graph_break []
2025-12-04T10:35:20.9901815Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9901902Z frames [('total', 1)]
2025-12-04T10:35:20.9902001Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9902186Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9902581Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9902668Z graph_break []
2025-12-04T10:35:20.9902786Z =================================== FAILURES ===================================
2025-12-04T10:35:20.9903044Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:20.9903156Z Traceback (most recent call last):
2025-12-04T10:35:20.9903469Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:20.9903580Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:20.9904041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:20.9904253Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:20.9904689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:20.9904847Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:20.9905279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:20.9905410Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:20.9905901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:20.9906175Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:20.9906621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:20.9906740Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:20.9907193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:20.9907289Z     return self._compile_to_module()
2025-12-04T10:35:20.9907699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:20.9908039Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:20.9908473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:20.9908589Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:20.9909004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:20.9909194Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:20.9909701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:20.9909807Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:20.9910239Z   File "/tmp/tmpggefi57i/ae/caea5evwbb6enzzbyc6agqavzrfi3hoa7mvo62gulno5ykg47bu6.py", line 51, in <module>
2025-12-04T10:35:20.9910629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:20.9910718Z     kernel.precompile(
2025-12-04T10:35:20.9911187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:20.9911283Z     self._precompile_worker()
2025-12-04T10:35:20.9911790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:20.9911939Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:20.9912441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9912607Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9912981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9913197Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9913564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9913842Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9914036Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9914300Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9914477Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9914593Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9914677Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9914756Z     x0 = xindex
2025-12-04T10:35:20.9914894Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9914990Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9915067Z            ^
2025-12-04T10:35:20.9915395Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9915400Z 
2025-12-04T10:35:20.9916003Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9916071Z 
2025-12-04T10:35:20.9916078Z 
2025-12-04T10:35:20.9916257Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9916931Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:20.9916936Z 
2025-12-04T10:35:20.9917212Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9917389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9917473Z frames [('total', 1)]
2025-12-04T10:35:20.9917624Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9918019Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9918208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9918286Z graph_break []
2025-12-04T10:35:20.9918459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9918545Z frames [('total', 1)]
2025-12-04T10:35:20.9918636Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9918824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9919221Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9919298Z graph_break []
2025-12-04T10:35:20.9919474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:20.9919557Z frames [('total', 1)]
2025-12-04T10:35:20.9919649Z stats [('calls_captured', 4)]
2025-12-04T10:35:20.9919839Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:20.9920227Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:20.9920304Z graph_break []
2025-12-04T10:35:20.9920859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml -
2025-12-04T10:35:20.9920999Z =========================== short test summary info ============================
2025-12-04T10:35:20.9921656Z FAILED [0.3271s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:20.9921921Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9922022Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9922139Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9922225Z     xmask = xindex < xnumel
2025-12-04T10:35:20.9922297Z     x0 = xindex
2025-12-04T10:35:20.9922434Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9922527Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9922604Z            ^
2025-12-04T10:35:20.9922925Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9922930Z 
2025-12-04T10:35:20.9923581Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:20.9923586Z 
2025-12-04T10:35:20.9923592Z 
2025-12-04T10:35:20.9923771Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:20.9924436Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:20.9924443Z 
2025-12-04T10:35:20.9924668Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:20.9924854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:20.9925018Z ================== 1 failed, 61 deselected, 2 rerun in 2.48s ===================
2025-12-04T10:35:20.9925101Z Got exit code 1
2025-12-04T10:35:20.9925184Z Retrying single test...
2025-12-04T10:35:20.9925593Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml
2025-12-04T10:35:20.9925798Z ============================= test session starts ==============================
2025-12-04T10:35:20.9926089Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:20.9926180Z cachedir: .pytest_cache
2025-12-04T10:35:20.9926661Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:20.9926761Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:20.9926853Z configfile: pytest.ini
2025-12-04T10:35:20.9927308Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:20.9927494Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:20.9928092Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:20.9928184Z Running 1 items in this shard
2025-12-04T10:35:20.9928188Z 
2025-12-04T10:35:20.9929122Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9929766Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9930229Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9930700Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9931110Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9931472Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9931971Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9932421Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9932845Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9933271Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9933740Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9934204Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9934661Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9934958Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9936550Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9937080Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9937812Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9938275Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9938979Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9939645Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9940363Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9940787Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9941497Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9942040Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9942772Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9943466Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9944186Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9944775Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9945487Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9946161Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9946924Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9947222Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9947792Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9948656Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9949103Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9950029Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9950562Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9951352Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9952009Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9952759Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9953421Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9953940Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9954584Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9955043Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9955520Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9955985Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9956344Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9956843Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9957285Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9957637Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9958376Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9958485Z ('RERUN', {'yellow': True}) [1.7678s] [100%]
2025-12-04T10:35:20.9959417Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9960058Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9960561Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9961033Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9961448Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9961851Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9962351Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9962831Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9963253Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9963685Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9964109Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9964566Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9965030Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9965324Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9966922Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9967376Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9968108Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9968530Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9969235Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9969834Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:20.9970598Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9971025Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9971735Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:20.9972343Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:20.9973076Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:20.9973803Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:20.9974515Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:20.9975143Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:20.9975898Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:20.9976490Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:20.9977245Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9977541Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9978120Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:20.9978429Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9978875Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9979813Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:20.9980347Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:20.9981101Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:20.9981676Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:20.9982465Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:20.9983120Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:20.9983643Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:20.9984282Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9984781Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9985254Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9985676Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9986070Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9986573Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9987050Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9987405Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:20.9988100Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:20.9988212Z ('RERUN', {'yellow': True}) [0.3298s] [100%]
2025-12-04T10:35:20.9989152Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:20.9989789Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:20.9990246Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:20.9990722Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:20.9991137Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:20.9991499Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:20.9991995Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:20.9992435Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:20.9992860Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:20.9993293Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:20.9993714Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:20.9994217Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:20.9994680Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:20.9994976Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:20.9996569Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:20.9997146Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:20.9997908Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:20.9998332Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:20.9999070Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:20.9999674Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0000393Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0000821Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0001529Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0002073Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0002806Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0003498Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0004214Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0004802Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0005514Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0006133Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0006888Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0007184Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0007927Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0008233Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0008779Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0009730Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0010354Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0011164Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0011796Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0012540Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0013197Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0013713Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0014352Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0014807Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0015284Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0015698Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0016055Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0016557Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0016993Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0017343Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0018036Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0018119Z FAILED [0.3293s] [100%]
2025-12-04T10:35:21.0018124Z 
2025-12-04T10:35:21.0018329Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.0018596Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0018701Z Traceback (most recent call last):
2025-12-04T10:35:21.0019011Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0019159Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0019606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0019827Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0020308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0020472Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0020904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0021031Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0021520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0021794Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0022330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0022502Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0023057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0023192Z     return self._compile_to_module()
2025-12-04T10:35:21.0023721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0023871Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0024312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0024417Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0024836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0025034Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0025538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0025642Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0026076Z   File "/tmp/tmpwp7kngc6/ap/capzrkg6dqv6xdacrwaqz3rrd7odavimxjzojflop2yh27s4yo2c.py", line 51, in <module>
2025-12-04T10:35:21.0026475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0026562Z     kernel.precompile(
2025-12-04T10:35:21.0027038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0027132Z     self._precompile_worker()
2025-12-04T10:35:21.0027635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0027786Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0028293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0028457Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0028843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0029108Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0029485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0029766Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0029958Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0030231Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0030328Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0030486Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0030575Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0030648Z     x0 = xindex
2025-12-04T10:35:21.0030787Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0030881Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0030948Z            ^
2025-12-04T10:35:21.0031278Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0031283Z 
2025-12-04T10:35:21.0031929Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0031935Z 
2025-12-04T10:35:21.0031939Z 
2025-12-04T10:35:21.0032122Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0032838Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0032845Z 
2025-12-04T10:35:21.0033066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0033246Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0033330Z frames [('total', 1)]
2025-12-04T10:35:21.0033428Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0033827Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0034011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0034089Z graph_break []
2025-12-04T10:35:21.0034349Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0034449Z Traceback (most recent call last):
2025-12-04T10:35:21.0034765Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0034865Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0035283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0035488Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0035976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0036138Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0036569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0036685Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0037141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0037411Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0037854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0037973Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0038424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0038527Z     return self._compile_to_module()
2025-12-04T10:35:21.0038937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0039070Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0039507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0039613Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0040032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0040265Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0040759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0040867Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0041269Z   File "/tmp/tmp_pbmoxbe/a3/ca3u2ajjun42444g6dvyz6egrpl3erlmmvy5h745rrbezfqzfbrp.py", line 51, in <module>
2025-12-04T10:35:21.0041703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0041788Z     kernel.precompile(
2025-12-04T10:35:21.0042256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0042392Z     self._precompile_worker()
2025-12-04T10:35:21.0042894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0043042Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0043542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0043706Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0044085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0044290Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0044661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0044948Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0045140Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0045411Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0045509Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0045619Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0045709Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0045779Z     x0 = xindex
2025-12-04T10:35:21.0045915Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0046011Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0046084Z            ^
2025-12-04T10:35:21.0046411Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0046416Z 
2025-12-04T10:35:21.0047024Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0047032Z 
2025-12-04T10:35:21.0047036Z 
2025-12-04T10:35:21.0047216Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0047887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0047892Z 
2025-12-04T10:35:21.0051990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0052204Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0052291Z frames [('total', 1)]
2025-12-04T10:35:21.0052392Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0052800Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0052991Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0053080Z graph_break []
2025-12-04T10:35:21.0053257Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0053388Z frames [('total', 1)]
2025-12-04T10:35:21.0053487Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0053667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0054064Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0054151Z graph_break []
2025-12-04T10:35:21.0054269Z =================================== FAILURES ===================================
2025-12-04T10:35:21.0054583Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0054684Z Traceback (most recent call last):
2025-12-04T10:35:21.0055003Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0055184Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0055599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0055808Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0056249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0056411Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0056851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0056973Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0057425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0057704Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0058147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0058276Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0058681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0058782Z     return self._compile_to_module()
2025-12-04T10:35:21.0059269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0059407Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0059846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0059962Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0060381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0060582Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0061079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0061181Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0061625Z   File "/tmp/tmpsmypz3ex/ia/ciawyuxkqmtttxywd36rbim2duyljfnbmfmvp2sqsqc2jltyyr3q.py", line 51, in <module>
2025-12-04T10:35:21.0062065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0062168Z     kernel.precompile(
2025-12-04T10:35:21.0062641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0062737Z     self._precompile_worker()
2025-12-04T10:35:21.0063249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0063394Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0063945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0064113Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0064492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0064702Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0065112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0065394Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0065650Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0065954Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0066059Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0066172Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0066259Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0066340Z     x0 = xindex
2025-12-04T10:35:21.0066477Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0066575Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0066659Z            ^
2025-12-04T10:35:21.0066987Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0066992Z 
2025-12-04T10:35:21.0067600Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0067614Z 
2025-12-04T10:35:21.0067618Z 
2025-12-04T10:35:21.0067796Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0068469Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0068477Z 
2025-12-04T10:35:21.0068711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0068890Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0068985Z frames [('total', 1)]
2025-12-04T10:35:21.0069084Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0069485Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0069678Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0069758Z graph_break []
2025-12-04T10:35:21.0069935Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0070023Z frames [('total', 1)]
2025-12-04T10:35:21.0070116Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0070305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0070697Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0070777Z graph_break []
2025-12-04T10:35:21.0071009Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0071097Z frames [('total', 1)]
2025-12-04T10:35:21.0071189Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0071379Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0071767Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0071846Z graph_break []
2025-12-04T10:35:21.0072408Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml -
2025-12-04T10:35:21.0072591Z =========================== short test summary info ============================
2025-12-04T10:35:21.0073247Z FAILED [0.3293s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0073519Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0073621Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0073743Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0073877Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0073962Z     x0 = xindex
2025-12-04T10:35:21.0074099Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0074194Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0074308Z            ^
2025-12-04T10:35:21.0074635Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0074640Z 
2025-12-04T10:35:21.0075247Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0075258Z 
2025-12-04T10:35:21.0075262Z 
2025-12-04T10:35:21.0075441Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0076112Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0076116Z 
2025-12-04T10:35:21.0076350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0076498Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.0076672Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ==================
2025-12-04T10:35:21.0076754Z Got exit code 1
2025-12-04T10:35:21.0076845Z Retrying single test...
2025-12-04T10:35:21.0077255Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml
2025-12-04T10:35:21.0077386Z ============================= test session starts ==============================
2025-12-04T10:35:21.0077679Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.0077776Z cachedir: .pytest_cache
2025-12-04T10:35:21.0078222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.0078328Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.0078414Z configfile: pytest.ini
2025-12-04T10:35:21.0078870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.0079068Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.0079665Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0079760Z Running 1 items in this shard
2025-12-04T10:35:21.0079765Z 
2025-12-04T10:35:21.0080750Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0081399Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0081869Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0082344Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0082805Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0083167Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0083667Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0084155Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0084579Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0085056Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0085480Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0085947Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0086411Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0086708Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0088254Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0088713Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0089453Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0089879Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0090592Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0091195Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0091916Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0092389Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0093105Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0093645Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0094380Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0095116Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0095831Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0096483Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0097246Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0097825Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0098586Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0098884Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0099548Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0099846Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0100298Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0101188Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0101720Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0102484Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0103053Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0103798Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0104452Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0105016Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0105668Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0106124Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0106611Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0107067Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0107429Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0108205Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0108728Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0109077Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0109827Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0109937Z ('RERUN', {'yellow': True}) [1.7682s] [100%]
2025-12-04T10:35:21.0110875Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0111512Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0111972Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0112442Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0112857Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0113218Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0113718Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0114165Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0114590Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0115022Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0115446Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0115905Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0116375Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0116733Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0118276Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0118786Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0119527Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0119957Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0120696Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0121339Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0122058Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0122489Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0123202Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0123745Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0124481Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0125179Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0125895Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0126484Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0127200Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0127779Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0128535Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0128878Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0129458Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0129755Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0130207Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0131096Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0131670Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0132466Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0133039Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0133895Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0134755Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0135374Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0136020Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0136477Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0136954Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0137370Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0137731Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0138242Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0138684Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0139087Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0139787Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0139900Z ('RERUN', {'yellow': True}) [0.3289s] [100%]
2025-12-04T10:35:21.0140835Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0141537Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0142003Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0142473Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0142894Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0143321Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0143816Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0144266Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0144728Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0145164Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0145627Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0146087Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0146551Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0146851Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0148386Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0148841Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0149680Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0150113Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0150825Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0151428Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0152150Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0152581Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0153337Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0153880Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0154610Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0155307Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0156069Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0156695Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0157414Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0158035Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0158794Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0159092Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0159671Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0159973Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0160423Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0161320Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0161859Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0162618Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0163190Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0163932Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0164590Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0165108Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0165799Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0166258Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0166731Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0167149Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0167548Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0168048Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0168487Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0168888Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0169589Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0169710Z FAILED [0.3302s] [100%]
2025-12-04T10:35:21.0169715Z 
2025-12-04T10:35:21.0169845Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.0170111Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0170223Z Traceback (most recent call last):
2025-12-04T10:35:21.0170533Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0170639Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0171068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0171282Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0171726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0171892Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0172326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0172455Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0172909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0173183Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0173634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0173760Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0174175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0174274Z     return self._compile_to_module()
2025-12-04T10:35:21.0174683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0174831Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0175271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0175383Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0175969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0176166Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0176674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0176777Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0177188Z   File "/tmp/tmpv1kk58_t/23/c23rvff6ei43cri4cmsllnhtvyo3jgw6uba26koyxxnkvhj5fise.py", line 51, in <module>
2025-12-04T10:35:21.0177591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0177732Z     kernel.precompile(
2025-12-04T10:35:21.0178208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0178309Z     self._precompile_worker()
2025-12-04T10:35:21.0178818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0178968Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0179566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0179739Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0180159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0180364Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0180746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0181027Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0181224Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0181500Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0181601Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0181718Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0181805Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0181876Z     x0 = xindex
2025-12-04T10:35:21.0182021Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0182117Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0182185Z            ^
2025-12-04T10:35:21.0182521Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0182528Z 
2025-12-04T10:35:21.0183136Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0183141Z 
2025-12-04T10:35:21.0183145Z 
2025-12-04T10:35:21.0183330Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0184004Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0184009Z 
2025-12-04T10:35:21.0184235Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0184415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0184499Z frames [('total', 1)]
2025-12-04T10:35:21.0184594Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0184994Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0185178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0185262Z graph_break []
2025-12-04T10:35:21.0185603Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0185714Z Traceback (most recent call last):
2025-12-04T10:35:21.0186076Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0186180Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0186594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0186805Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0187242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0187454Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0187891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0188017Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0188472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0188789Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0189235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0189395Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0189806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0189906Z     return self._compile_to_module()
2025-12-04T10:35:21.0190313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0190452Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0190890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0190993Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0191419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0191615Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0192120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0192223Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0192634Z   File "/tmp/tmp3oz_wrbl/37/c373tt76ok5bcbnefwvhgadbdhogznnoubl3wkrtxrqgapg67i35.py", line 51, in <module>
2025-12-04T10:35:21.0193040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0193130Z     kernel.precompile(
2025-12-04T10:35:21.0193618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0193712Z     self._precompile_worker()
2025-12-04T10:35:21.0194220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0194371Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0194877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0195049Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0195436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0195640Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0196077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0196359Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0196560Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0196833Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0196932Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0197047Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0197135Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0197205Z     x0 = xindex
2025-12-04T10:35:21.0197396Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0197494Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0197563Z            ^
2025-12-04T10:35:21.0197892Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0197897Z 
2025-12-04T10:35:21.0198503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0198507Z 
2025-12-04T10:35:21.0198551Z 
2025-12-04T10:35:21.0198735Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0199406Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0199450Z 
2025-12-04T10:35:21.0199672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0199857Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0199936Z frames [('total', 1)]
2025-12-04T10:35:21.0200030Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0200437Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0200620Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0200701Z graph_break []
2025-12-04T10:35:21.0200882Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0200963Z frames [('total', 1)]
2025-12-04T10:35:21.0201056Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0201237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0201633Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0201712Z graph_break []
2025-12-04T10:35:21.0201830Z =================================== FAILURES ===================================
2025-12-04T10:35:21.0202095Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0202193Z Traceback (most recent call last):
2025-12-04T10:35:21.0202508Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0202618Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0203029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0203245Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0203679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0203836Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0204269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0204386Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0204884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0205155Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0205595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0205722Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0206126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0206224Z     return self._compile_to_module()
2025-12-04T10:35:21.0206639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0206813Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0207254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0207362Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0207955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0208228Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0208725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0208884Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0209320Z   File "/tmp/tmpidncq7tf/az/cazpasr2x2aohewuzq3ri4zqtffqcm3ol65dvh54dnpnkvy7cske.py", line 51, in <module>
2025-12-04T10:35:21.0209713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0209806Z     kernel.precompile(
2025-12-04T10:35:21.0210275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0210369Z     self._precompile_worker()
2025-12-04T10:35:21.0210876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0211026Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0211533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0211698Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0212074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0212279Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0212648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0212930Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0213128Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0213395Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0213495Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0213604Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0213687Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0213770Z     x0 = xindex
2025-12-04T10:35:21.0213905Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0213996Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0214072Z            ^
2025-12-04T10:35:21.0214397Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0214402Z 
2025-12-04T10:35:21.0215068Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0215073Z 
2025-12-04T10:35:21.0215077Z 
2025-12-04T10:35:21.0215255Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0215980Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0215992Z 
2025-12-04T10:35:21.0216216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0216390Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0216528Z frames [('total', 1)]
2025-12-04T10:35:21.0216618Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0217013Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0217200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0217280Z graph_break []
2025-12-04T10:35:21.0217458Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0217541Z frames [('total', 1)]
2025-12-04T10:35:21.0217673Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0217858Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0218255Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0218400Z graph_break []
2025-12-04T10:35:21.0218577Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0218662Z frames [('total', 1)]
2025-12-04T10:35:21.0218756Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0218938Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0219374Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0219461Z graph_break []
2025-12-04T10:35:21.0220014Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml -
2025-12-04T10:35:21.0220152Z =========================== short test summary info ============================
2025-12-04T10:35:21.0220806Z FAILED [0.3302s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0221077Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0221188Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0221296Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0221380Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0221454Z     x0 = xindex
2025-12-04T10:35:21.0221591Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0221694Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0221769Z            ^
2025-12-04T10:35:21.0222098Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0222105Z 
2025-12-04T10:35:21.0222713Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0222720Z 
2025-12-04T10:35:21.0222723Z 
2025-12-04T10:35:21.0222903Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0223575Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0223584Z 
2025-12-04T10:35:21.0223805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0224000Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.0224173Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ==================
2025-12-04T10:35:21.0224250Z Got exit code 1
2025-12-04T10:35:21.0224709Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16
2025-12-04T10:35:21.0225073Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:21.0225469Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml
2025-12-04T10:35:21.0225654Z ============================= test session starts ==============================
2025-12-04T10:35:21.0225943Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.0226034Z cachedir: .pytest_cache
2025-12-04T10:35:21.0226481Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.0226580Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.0226667Z configfile: pytest.ini
2025-12-04T10:35:21.0227176Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.0227370Z collecting ... collected 188 items / 62 deselected / 126 selected
2025-12-04T10:35:21.0227531Z stepcurrent: skipping 62 already run items.
2025-12-04T10:35:21.0227624Z Running 126 items in this shard
2025-12-04T10:35:21.0227628Z 
2025-12-04T10:35:21.0228581Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0229229Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0229691Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0230171Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0230585Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0230942Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0231452Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0231896Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0232322Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0232748Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0233168Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0233629Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0234088Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0234383Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0236052Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0236509Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0237277Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0237707Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0238447Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0239045Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0239822Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0240249Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0240966Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0241501Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0242235Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0242925Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0243633Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0244234Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0244948Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0245530Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0246278Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0246579Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0247194Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0247497Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0247950Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0248837Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0249418Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0250166Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0250780Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0251521Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0252213Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0252735Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0253372Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0253837Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0254310Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0254727Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0255088Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0255583Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0256075Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0256416Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0257124Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0257232Z ('RERUN', {'yellow': True}) [1.7593s] [  0%]
2025-12-04T10:35:21.0258184Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0258828Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0259375Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0259857Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0260271Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0260631Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0261177Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0261619Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0262048Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0262542Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0262966Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0263464Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0263933Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0264236Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0265783Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0266289Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0267016Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0267445Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0268152Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0268749Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0269476Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0269903Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0270625Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0271204Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0271945Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0272636Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0273389Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0273988Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0274737Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0275317Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0276106Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0276414Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0276990Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0277285Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0277737Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0278623Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0279157Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0279909Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0280489Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0281234Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0281887Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0282407Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0283085Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0283543Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0284013Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0284428Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0284787Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0285327Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0285774Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0286119Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0286858Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0286966Z ('RERUN', {'yellow': True}) [0.3308s] [  0%]
2025-12-04T10:35:21.0287953Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0288598Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0289056Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0289539Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0289951Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0290311Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0290818Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0291266Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0291698Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0292123Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0292556Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0293014Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0293471Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0293773Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0295360Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0295844Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0296596Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0297064Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0297772Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0298406Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0299213Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0299681Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0300398Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0300936Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0301672Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0302364Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0303078Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0303674Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0304395Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0304989Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0305742Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0306042Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0306616Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0306953Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0307417Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0308445Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0308985Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0309886Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0310508Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0311364Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0312061Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0312675Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0313363Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0313855Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0314360Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0314806Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0315194Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0315728Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0316206Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0316576Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0317332Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0317417Z FAILED [0.3282s] [  0%]
2025-12-04T10:35:21.0317422Z 
2025-12-04T10:35:21.0317553Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.0317853Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0317958Z Traceback (most recent call last):
2025-12-04T10:35:21.0318294Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0318410Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0318848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0319133Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0319600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0319769Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0320235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0320365Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0320850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0321181Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0321653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0321787Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0322221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0322363Z     return self._compile_to_module()
2025-12-04T10:35:21.0322776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0322908Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0323390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0323497Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0323912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0324107Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0324602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0324712Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0325139Z   File "/tmp/tmptvikjk14/32/c32ufripxwlo6rki4djw6fc74de3sry7zb5alnkovhssl7x5mrna.py", line 51, in <module>
2025-12-04T10:35:21.0325528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0325622Z     kernel.precompile(
2025-12-04T10:35:21.0326140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0326237Z     self._precompile_worker()
2025-12-04T10:35:21.0326744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0326890Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0327401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0327562Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0327940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0328142Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0328518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0328806Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0329001Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0329268Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0329369Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0329524Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0329611Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0329687Z     x0 = xindex
2025-12-04T10:35:21.0329823Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0329919Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0329989Z            ^
2025-12-04T10:35:21.0330316Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0330324Z 
2025-12-04T10:35:21.0330928Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0330982Z 
2025-12-04T10:35:21.0330986Z 
2025-12-04T10:35:21.0331169Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0331860Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0331866Z 
2025-12-04T10:35:21.0332087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0332304Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0332391Z frames [('total', 1)]
2025-12-04T10:35:21.0332483Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0332923Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0333106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0333187Z graph_break []
2025-12-04T10:35:21.0333461Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0333557Z Traceback (most recent call last):
2025-12-04T10:35:21.0333862Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0333966Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0334377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0334586Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0335017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0335177Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0335607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0335727Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0336230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0336504Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0336942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0337065Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0337466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0337563Z     return self._compile_to_module()
2025-12-04T10:35:21.0337974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0338108Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0338548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0338649Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0339165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0343087Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0343616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0343724Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0344178Z   File "/tmp/tmpb5vcbfw5/pw/cpwvrawzilgiwzwoqywcm7v4lxfs4vbircxc32axt5tqcf5jg5ns.py", line 51, in <module>
2025-12-04T10:35:21.0344575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0344742Z     kernel.precompile(
2025-12-04T10:35:21.0345215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0345315Z     self._precompile_worker()
2025-12-04T10:35:21.0345838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0345987Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0346548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0346717Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0347217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0347434Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0347810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0348096Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0348305Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0348574Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0348680Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0348791Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0348881Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0348965Z     x0 = xindex
2025-12-04T10:35:21.0349107Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0349203Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0349289Z            ^
2025-12-04T10:35:21.0349625Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0349631Z 
2025-12-04T10:35:21.0350245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0350250Z 
2025-12-04T10:35:21.0350254Z 
2025-12-04T10:35:21.0350436Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0351126Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0351136Z 
2025-12-04T10:35:21.0351360Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0351546Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0351637Z frames [('total', 1)]
2025-12-04T10:35:21.0351732Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0352135Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0352324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0352407Z graph_break []
2025-12-04T10:35:21.0352668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0352752Z frames [('total', 1)]
2025-12-04T10:35:21.0352847Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0353038Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0353430Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0353514Z graph_break []
2025-12-04T10:35:21.0353637Z =================================== FAILURES ===================================
2025-12-04T10:35:21.0353910Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0354057Z Traceback (most recent call last):
2025-12-04T10:35:21.0354379Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0354482Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0354905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0355114Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0355592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0355759Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0356239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0356368Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0356824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0357096Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0357539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0357659Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0358075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0358175Z     return self._compile_to_module()
2025-12-04T10:35:21.0358582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0358728Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0359174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0359282Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0359706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0359899Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0360413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0360519Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0360930Z   File "/tmp/tmpj_bc2pgi/gr/cgr4gq5h7yxqi5zjnmckvmyxjq6btl52eoikluy6yoapefty7ywm.py", line 51, in <module>
2025-12-04T10:35:21.0361329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0361417Z     kernel.precompile(
2025-12-04T10:35:21.0361886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0361988Z     self._precompile_worker()
2025-12-04T10:35:21.0362496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0362701Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0363213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0363374Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0363765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0363969Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0364349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0364679Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0364871Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0365154Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0365253Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0365369Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0365504Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0365585Z     x0 = xindex
2025-12-04T10:35:21.0365730Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0365828Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0365943Z            ^
2025-12-04T10:35:21.0366275Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0366279Z 
2025-12-04T10:35:21.0366888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0366893Z 
2025-12-04T10:35:21.0366896Z 
2025-12-04T10:35:21.0367082Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0367767Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0367772Z 
2025-12-04T10:35:21.0367998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0368188Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0368280Z frames [('total', 1)]
2025-12-04T10:35:21.0368385Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0368790Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0368981Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0369063Z graph_break []
2025-12-04T10:35:21.0369242Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0369326Z frames [('total', 1)]
2025-12-04T10:35:21.0369435Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0369615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0370020Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0370100Z graph_break []
2025-12-04T10:35:21.0370280Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0370374Z frames [('total', 1)]
2025-12-04T10:35:21.0370468Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0370650Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0371049Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0371134Z graph_break []
2025-12-04T10:35:21.0371691Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml -
2025-12-04T10:35:21.0371900Z =========================== short test summary info ============================
2025-12-04T10:35:21.0372582Z FAILED [0.3282s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0372858Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0372961Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0373073Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0373166Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0373286Z     x0 = xindex
2025-12-04T10:35:21.0373430Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0373528Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0373597Z            ^
2025-12-04T10:35:21.0373938Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0373946Z 
2025-12-04T10:35:21.0374589Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0374594Z 
2025-12-04T10:35:21.0374598Z 
2025-12-04T10:35:21.0374786Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0375474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0375520Z 
2025-12-04T10:35:21.0375750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0375931Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.0376130Z ================== 1 failed, 62 deselected, 2 rerun in 2.45s ===================
2025-12-04T10:35:21.0376216Z Got exit code 1
2025-12-04T10:35:21.0376306Z Retrying single test...
2025-12-04T10:35:21.0376708Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml
2025-12-04T10:35:21.0376848Z ============================= test session starts ==============================
2025-12-04T10:35:21.0377138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.0377233Z cachedir: .pytest_cache
2025-12-04T10:35:21.0377687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.0377789Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.0377875Z configfile: pytest.ini
2025-12-04T10:35:21.0378343Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.0378527Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.0379224Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0379321Z Running 1 items in this shard
2025-12-04T10:35:21.0379325Z 
2025-12-04T10:35:21.0380283Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0380931Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0381398Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0381928Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0382349Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0382716Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0383214Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0383657Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0384132Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0384561Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0384993Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0385489Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0385972Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0386364Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0387907Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0388368Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0389098Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0389538Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0390242Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0390853Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0391572Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0392000Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0392721Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0393264Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0394053Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0394749Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0395472Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0396107Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0396816Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0397443Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0398195Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0398543Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0399121Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0399430Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0399878Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0400769Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0401306Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0402055Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0402636Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0403384Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0404052Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0404576Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0405216Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0405682Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0406246Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0406679Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0407048Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0407547Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0408212Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0408558Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0409268Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0409376Z ('RERUN', {'yellow': True}) [1.7732s] [100%]
2025-12-04T10:35:21.0410406Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0411093Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0411555Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0412035Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0412451Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0412820Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0413317Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0413759Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0414195Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0414626Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0415060Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0415525Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0416013Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0416346Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0417944Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0418412Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0419176Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0419610Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0420375Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0420984Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0421741Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0422170Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0422927Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0423465Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0424204Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0424904Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0425620Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0426207Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0426918Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0427508Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0428256Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0428558Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0429130Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0429433Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0429879Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0430809Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0431344Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0432096Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0432725Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0433470Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0434197Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0434721Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0435486Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0436113Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0436724Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0437146Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0437512Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0438016Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0438481Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0438828Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0439536Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0439645Z ('RERUN', {'yellow': True}) [0.3319s] [100%]
2025-12-04T10:35:21.0440601Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0441246Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0441708Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0442190Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0442671Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0443042Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0443540Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0443992Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0444428Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0444898Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0445339Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0445801Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0446300Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0446606Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0448186Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0448645Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0449370Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0449902Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0450612Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0451224Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0451945Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0452373Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0453098Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0453642Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0454390Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0455144Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0455874Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0456473Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0457237Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0457833Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0458642Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0458956Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0459633Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0459951Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0460406Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0461295Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0461841Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0462600Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0463190Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0463935Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0464602Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0465126Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0465771Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0466246Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0466723Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0467194Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0467568Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0468071Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0468525Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0468911Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0469618Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0469700Z FAILED [0.3292s] [100%]
2025-12-04T10:35:21.0469707Z 
2025-12-04T10:35:21.0469827Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.0470150Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0470252Z Traceback (most recent call last):
2025-12-04T10:35:21.0470572Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0470717Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0471133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0471358Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0471793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0471966Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0472404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0472526Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0472994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0473263Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0473706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0473837Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0474245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0474350Z     return self._compile_to_module()
2025-12-04T10:35:21.0474763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0474895Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0475342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0475456Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0475919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0476136Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0476633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0476744Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0477180Z   File "/tmp/tmpvy7pjtjg/nl/cnlezz4bkmjgo3cdm3hubkxdvfz6nhi6w5vdckr2n4n6gqyhfvo5.py", line 51, in <module>
2025-12-04T10:35:21.0477623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0477720Z     kernel.precompile(
2025-12-04T10:35:21.0478192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0478295Z     self._precompile_worker()
2025-12-04T10:35:21.0478798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0478948Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0479526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0479689Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0480076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0480280Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0480700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0480987Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0481184Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0481490Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0481601Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0481716Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0481811Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0481885Z     x0 = xindex
2025-12-04T10:35:21.0482020Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0482124Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0482198Z            ^
2025-12-04T10:35:21.0482526Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0482531Z 
2025-12-04T10:35:21.0483154Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0483159Z 
2025-12-04T10:35:21.0483166Z 
2025-12-04T10:35:21.0483345Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0484042Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0484049Z 
2025-12-04T10:35:21.0484273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0484461Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0484548Z frames [('total', 1)]
2025-12-04T10:35:21.0484641Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0485050Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0485239Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0485318Z graph_break []
2025-12-04T10:35:21.0485601Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0485701Z Traceback (most recent call last):
2025-12-04T10:35:21.0486049Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0486174Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0486594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0486809Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0487295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0487460Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0487900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0488018Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0488477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0488788Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0489225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0489347Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0489756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0489856Z     return self._compile_to_module()
2025-12-04T10:35:21.0490300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0490434Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0490916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0491022Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0491445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0491638Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0492134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0492237Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0492673Z   File "/tmp/tmpft9gncmr/ky/ckykqdr43bjcvvjgmkbi3r4vji2rl6yinuvnqvudoddwhf3lctm2.py", line 51, in <module>
2025-12-04T10:35:21.0493068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0493157Z     kernel.precompile(
2025-12-04T10:35:21.0493631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0493727Z     self._precompile_worker()
2025-12-04T10:35:21.0494231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0494378Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0494889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0495054Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0495432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0495638Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0496005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0496295Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0496488Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0496754Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0496852Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0496966Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0497094Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0497172Z     x0 = xindex
2025-12-04T10:35:21.0497308Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0497409Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0497481Z            ^
2025-12-04T10:35:21.0497806Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0497813Z 
2025-12-04T10:35:21.0498422Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0498473Z 
2025-12-04T10:35:21.0498477Z 
2025-12-04T10:35:21.0498660Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0499401Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0499409Z 
2025-12-04T10:35:21.0499630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0499850Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0499933Z frames [('total', 1)]
2025-12-04T10:35:21.0500024Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0500428Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0500653Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0500730Z graph_break []
2025-12-04T10:35:21.0500908Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0500988Z frames [('total', 1)]
2025-12-04T10:35:21.0501077Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0501259Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0501652Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0501732Z graph_break []
2025-12-04T10:35:21.0501851Z =================================== FAILURES ===================================
2025-12-04T10:35:21.0502122Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0502226Z Traceback (most recent call last):
2025-12-04T10:35:21.0502541Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0502640Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0503052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0503256Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0503690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0503849Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0504280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0504399Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0504847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0505117Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0505557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0505676Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0506126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0506271Z     return self._compile_to_module()
2025-12-04T10:35:21.0506682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0506822Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0507257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0507366Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0507939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0508210Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0508707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0508808Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0509240Z   File "/tmp/tmp9i36wj3s/in/cintglatnz3hxbsk7lef7gw5ajiiljp2an43jw3kh3lgbskpyppc.py", line 51, in <module>
2025-12-04T10:35:21.0509689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0509776Z     kernel.precompile(
2025-12-04T10:35:21.0510246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0510393Z     self._precompile_worker()
2025-12-04T10:35:21.0510897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0511046Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0511547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0511721Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0512098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0512299Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0512674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0512950Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0513144Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0513410Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0513508Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0513618Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0513703Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0513773Z     x0 = xindex
2025-12-04T10:35:21.0513920Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0514014Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0514083Z            ^
2025-12-04T10:35:21.0514412Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0514417Z 
2025-12-04T10:35:21.0515021Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0515028Z 
2025-12-04T10:35:21.0515032Z 
2025-12-04T10:35:21.0515218Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0515916Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0515921Z 
2025-12-04T10:35:21.0516180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0516443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0516528Z frames [('total', 1)]
2025-12-04T10:35:21.0516621Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0517019Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0517201Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0517284Z graph_break []
2025-12-04T10:35:21.0517460Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0517591Z frames [('total', 1)]
2025-12-04T10:35:21.0517683Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0517863Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0518255Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0518330Z graph_break []
2025-12-04T10:35:21.0518504Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0518587Z frames [('total', 1)]
2025-12-04T10:35:21.0518796Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0518981Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0519377Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0519494Z graph_break []
2025-12-04T10:35:21.0520051Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml -
2025-12-04T10:35:21.0520193Z =========================== short test summary info ============================
2025-12-04T10:35:21.0520864Z FAILED [0.3292s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0521139Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0521239Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0521355Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0521440Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0521511Z     x0 = xindex
2025-12-04T10:35:21.0521647Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0521749Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0521822Z            ^
2025-12-04T10:35:21.0522148Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0522156Z 
2025-12-04T10:35:21.0522758Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0522763Z 
2025-12-04T10:35:21.0522767Z 
2025-12-04T10:35:21.0522951Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0523643Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0523648Z 
2025-12-04T10:35:21.0523872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0524020Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.0524184Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ==================
2025-12-04T10:35:21.0524265Z Got exit code 1
2025-12-04T10:35:21.0524348Z Retrying single test...
2025-12-04T10:35:21.0524751Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml
2025-12-04T10:35:21.0524883Z ============================= test session starts ==============================
2025-12-04T10:35:21.0525220Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.0525311Z cachedir: .pytest_cache
2025-12-04T10:35:21.0525761Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.0525862Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.0525950Z configfile: pytest.ini
2025-12-04T10:35:21.0526410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.0526596Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.0527256Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0527353Z Running 1 items in this shard
2025-12-04T10:35:21.0527357Z 
2025-12-04T10:35:21.0528352Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0528998Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0529500Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0529971Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0530382Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0530748Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0531246Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0531689Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0532113Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0532537Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0532962Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0533419Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0533877Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0534174Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0535707Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0536167Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0536942Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0537375Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0538074Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0538716Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0539478Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0539903Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0540660Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0541238Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0541971Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0542664Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0543380Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0543967Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0544690Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0545268Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0546022Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0546327Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0546900Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0547198Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0547645Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0548573Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0549189Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0549940Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0550517Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0551302Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0551957Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0552511Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0553156Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0553677Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0554148Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0554568Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0554927Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0555432Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0555870Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0556215Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0556910Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0557016Z ('RERUN', {'yellow': True}) [1.7634s] [100%]
2025-12-04T10:35:21.0557965Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0558600Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0559059Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0559529Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0559947Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0560309Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0560852Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0561295Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0561717Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0562145Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0562610Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0563066Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0563525Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0563856Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0565391Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0565888Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0566617Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0567046Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0567743Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0568346Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0569063Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0569491Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0570201Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0570734Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0571469Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0572160Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0572917Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0573507Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0574221Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0574842Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0575590Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0575928Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0576554Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0576853Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0577343Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0578230Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0578759Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0579556Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0580131Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0580874Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0581529Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0582046Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0582689Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0583143Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0583613Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0584032Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0584433Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0584934Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0585372Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0585716Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0586466Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0586613Z ('RERUN', {'yellow': True}) [0.3311s] [100%]
2025-12-04T10:35:21.0587568Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0588241Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0588702Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0589208Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0589621Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0589985Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0590481Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0590927Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0591347Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0591774Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0592197Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0592658Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0593125Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0593418Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0594954Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0595408Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0596180Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0596609Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0597312Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0597921Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0598678Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0599106Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0599884Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0600418Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0601190Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0601880Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0602593Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0603183Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0603898Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0604478Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0605224Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0605527Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0606149Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0606444Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0606891Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0607979Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0608652Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0609414Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0609988Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0610731Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0611447Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0611967Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0612670Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0613124Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0613650Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0614072Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0614426Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0614925Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0615363Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0615704Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0616452Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0616533Z FAILED [0.3301s] [100%]
2025-12-04T10:35:21.0616538Z 
2025-12-04T10:35:21.0616657Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.0616931Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0617030Z Traceback (most recent call last):
2025-12-04T10:35:21.0617346Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0617448Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0617860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0618071Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0618504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0618671Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0619152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0619275Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0619775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0620048Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0620493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0620612Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0621016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0621118Z     return self._compile_to_module()
2025-12-04T10:35:21.0621568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0621711Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0622144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0622251Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0622671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0622903Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0623398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0623543Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0624045Z   File "/tmp/tmpzukadt1x/xo/cxoxazj6pg7amgxbqsf4t4bkpb4hlh74aydvwarqei67mggxclsd.py", line 51, in <module>
2025-12-04T10:35:21.0624604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0625447Z     kernel.precompile(
2025-12-04T10:35:21.0626263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0627173Z     self._precompile_worker()
2025-12-04T10:35:21.0628163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0638133Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0639299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0640506Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0641397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0642456Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0643477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0644662Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0645442Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0646081Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0646571Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0646896Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0647204Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0647447Z     x0 = xindex
2025-12-04T10:35:21.0647710Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0648056Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0648317Z            ^
2025-12-04T10:35:21.0648766Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0649214Z 
2025-12-04T10:35:21.0649929Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0650663Z 
2025-12-04T10:35:21.0650667Z 
2025-12-04T10:35:21.0650851Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0651842Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0652657Z 
2025-12-04T10:35:21.0652884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0653417Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0653850Z frames [('total', 1)]
2025-12-04T10:35:21.0654094Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0654675Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0655377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0655766Z graph_break []
2025-12-04T10:35:21.0656226Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0656775Z Traceback (most recent call last):
2025-12-04T10:35:21.0657285Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0657836Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0658520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0659347Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0660121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0660847Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0661566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0662239Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0662926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0663780Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0664629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0665314Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0665961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0666585Z     return self._compile_to_module()
2025-12-04T10:35:21.0667194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0667872Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0668561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0669224Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0669850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0670588Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0671405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0672136Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0672775Z   File "/tmp/tmpcjx16ceo/fl/cflmujv3ks7wts63fowvlti4m252p46khmlznsr44g7jcvswctjo.py", line 51, in <module>
2025-12-04T10:35:21.0673777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0674384Z     kernel.precompile(
2025-12-04T10:35:21.0675005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0675696Z     self._precompile_worker()
2025-12-04T10:35:21.0676391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0677171Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0677929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0678797Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0679464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0680177Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0680874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0681700Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0682305Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0682921Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0683410Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0683731Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0684045Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0684275Z     x0 = xindex
2025-12-04T10:35:21.0684543Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0684884Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0685139Z            ^
2025-12-04T10:35:21.0685589Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0686067Z 
2025-12-04T10:35:21.0686705Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0687428Z 
2025-12-04T10:35:21.0687432Z 
2025-12-04T10:35:21.0687616Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0688604Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0689426Z 
2025-12-04T10:35:21.0689654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0690194Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0690584Z frames [('total', 1)]
2025-12-04T10:35:21.0690821Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0691404Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0692121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0692506Z graph_break []
2025-12-04T10:35:21.0692816Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0693202Z frames [('total', 1)]
2025-12-04T10:35:21.0693448Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0693806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0694506Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0695103Z graph_break []
2025-12-04T10:35:21.0695352Z =================================== FAILURES ===================================
2025-12-04T10:35:21.0695951Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _
2025-12-04T10:35:21.0696483Z Traceback (most recent call last):
2025-12-04T10:35:21.0696996Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0697534Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0698159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0698909Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0699712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0700473Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0701179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0701862Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0702549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0703434Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0704275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0705002Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0705641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0706272Z     return self._compile_to_module()
2025-12-04T10:35:21.0706886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0707549Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0708504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0709181Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0709836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0710563Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0711378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0712120Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0712786Z   File "/tmp/tmpooxi8a7g/a7/ca7zyzfbflizizqma2de2nvotgrwa3o2dhuxblteripex2n33rzd.py", line 51, in <module>
2025-12-04T10:35:21.0713752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0714367Z     kernel.precompile(
2025-12-04T10:35:21.0715008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0715699Z     self._precompile_worker()
2025-12-04T10:35:21.0716403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0717183Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0717958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0718746Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0719421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0720124Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0720934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0721714Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0722329Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0722916Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0723402Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0723731Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0724050Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0724291Z     x0 = xindex
2025-12-04T10:35:21.0724620Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0724977Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0725244Z            ^
2025-12-04T10:35:21.0725702Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0726207Z 
2025-12-04T10:35:21.0726820Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0727545Z 
2025-12-04T10:35:21.0727610Z 
2025-12-04T10:35:21.0727792Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0728770Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0729624Z 
2025-12-04T10:35:21.0729853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0730382Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0730764Z frames [('total', 1)]
2025-12-04T10:35:21.0731006Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0731576Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0732278Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0732660Z graph_break []
2025-12-04T10:35:21.0732962Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0733378Z frames [('total', 1)]
2025-12-04T10:35:21.0733711Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0734192Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0735001Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0735597Z graph_break []
2025-12-04T10:35:21.0735916Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0736326Z frames [('total', 1)]
2025-12-04T10:35:21.0736569Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0736929Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0737616Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0738203Z graph_break []
2025-12-04T10:35:21.0738892Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml -
2025-12-04T10:35:21.0739762Z =========================== short test summary info ============================
2025-12-04T10:35:21.0740712Z FAILED [0.3301s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0741770Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0742265Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0742589Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0742897Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0743210Z     x0 = xindex
2025-12-04T10:35:21.0743469Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2025-12-04T10:35:21.0743812Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0744086Z            ^
2025-12-04T10:35:21.0744536Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0744989Z 
2025-12-04T10:35:21.0745602Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0746326Z 
2025-12-04T10:35:21.0746376Z 
2025-12-04T10:35:21.0746558Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0747650Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0748460Z 
2025-12-04T10:35:21.0748691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0749180Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.0749652Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ==================
2025-12-04T10:35:21.0750015Z Got exit code 1
2025-12-04T10:35:21.0750616Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16
2025-12-04T10:35:21.0751589Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:21.0752457Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml
2025-12-04T10:35:21.0753112Z ============================= test session starts ==============================
2025-12-04T10:35:21.0753652Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.0754156Z cachedir: .pytest_cache
2025-12-04T10:35:21.0754749Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.0755413Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.0755698Z configfile: pytest.ini
2025-12-04T10:35:21.0756343Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.0757107Z collecting ... collected 188 items / 63 deselected / 125 selected
2025-12-04T10:35:21.0757521Z stepcurrent: skipping 63 already run items.
2025-12-04T10:35:21.0757831Z Running 125 items in this shard
2025-12-04T10:35:21.0758005Z 
2025-12-04T10:35:21.0758946Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0760644Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0761850Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0762887Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0763898Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0764784Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0765802Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0766848Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0767829Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0768795Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0769771Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0770809Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0771840Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0772715Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0774709Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0776833Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0778130Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0779440Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0780685Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0782101Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0783532Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0784786Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0786039Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0787399Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0788782Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0790332Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0791895Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0793315Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0794727Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0796182Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0797751Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0798914Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0799937Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0800923Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0801779Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0803257Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0804788Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0806231Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0807667Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0809339Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0810840Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0812124Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0813398Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0814605Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0815655Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0816692Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0817579Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0818596Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0819657Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0820557Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0821714Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0822630Z ('RERUN', {'yellow': True}) [1.7683s] [  0%]
2025-12-04T10:35:21.0823836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0825503Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0826815Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0827852Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0828904Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0829793Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0830714Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0831724Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0832791Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0833764Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0834723Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0835721Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0836806Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0837678Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0839627Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0841703Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0842989Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0844292Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0845538Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0846944Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0848370Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0849670Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0850922Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0852322Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0853708Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0855316Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0856898Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0858313Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0859770Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0861170Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0862610Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0863768Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0864757Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0865739Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0866640Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0868081Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0869610Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0871051Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0872487Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0873909Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0875417Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0876738Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0878009Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0879252Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0880289Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0881328Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0882218Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0883144Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0884156Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0885053Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0886264Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0887181Z ('RERUN', {'yellow': True}) [0.3264s] [  0%]
2025-12-04T10:35:21.0888316Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.0889987Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0891197Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0892237Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0893234Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0894121Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0895053Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0896102Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0897136Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.0898104Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.0899139Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.0900143Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.0901174Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.0902092Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0904081Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.0906202Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0907487Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0908980Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0910226Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.0911646Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.0913079Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.0914347Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.0915600Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.0916956Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.0918347Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.0919891Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.0921416Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.0922908Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.0924325Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.0925739Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.0927226Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0928444Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0929433Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.0930417Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.0931332Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.0932779Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0934358Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0935751Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0937179Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0938610Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0940255Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0941632Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.0942998Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0944287Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0945399Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0946518Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.0947472Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.0948461Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0949544Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0950562Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.0951804Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0952754Z FAILED [0.3286s] [  0%]
2025-12-04T10:35:21.0952912Z 
2025-12-04T10:35:21.0953042Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.0953578Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.0954132Z Traceback (most recent call last):
2025-12-04T10:35:21.0954669Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0955233Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0955899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0956678Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0957552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.0958270Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.0958974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.0959680Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.0965593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.0966526Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.0967431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.0968124Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.0968788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.0969422Z     return self._compile_to_module()
2025-12-04T10:35:21.0970033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.0970705Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.0971403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.0972079Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.0972714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.0973447Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.0974269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.0974993Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.0975645Z   File "/tmp/tmptdtmccgn/ud/cudjexpqa6lgej5xcbof6spokj54sh2qfgizgsuucr3igvk34tb7.py", line 51, in <module>
2025-12-04T10:35:21.0976598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.0977212Z     kernel.precompile(
2025-12-04T10:35:21.0977834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.0978530Z     self._precompile_worker()
2025-12-04T10:35:21.0979288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.0980069Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.0980902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.0981701Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.0982367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.0983076Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.0983767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.0984590Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.0985197Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.0985768Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.0986264Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.0986584Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.0986899Z     xmask = xindex < xnumel
2025-12-04T10:35:21.0987131Z     x0 = xindex
2025-12-04T10:35:21.0987402Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.0987707Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.0987960Z            ^
2025-12-04T10:35:21.0988403Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.0988893Z 
2025-12-04T10:35:21.0989512Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.0990235Z 
2025-12-04T10:35:21.0990239Z 
2025-12-04T10:35:21.0990430Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.0991397Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.0992194Z 
2025-12-04T10:35:21.0992421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.0992948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.0993331Z frames [('total', 1)]
2025-12-04T10:35:21.0993572Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.0994155Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.0994870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.0995248Z graph_break []
2025-12-04T10:35:21.0995648Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.0996193Z Traceback (most recent call last):
2025-12-04T10:35:21.0996710Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.0997252Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.0997894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.0998647Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.0999404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1000132Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1000844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1001527Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1002223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1003119Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1003962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1004647Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1005288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1005919Z     return self._compile_to_module()
2025-12-04T10:35:21.1006535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1007245Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1008188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1008872Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1009520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1010246Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1011136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1011868Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1012562Z   File "/tmp/tmp3z_eqw_z/hr/chroy2zbvb2tsvxktfphex7ndibrb3ibndqekzpkqncot5kjblwz.py", line 51, in <module>
2025-12-04T10:35:21.1013484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1014090Z     kernel.precompile(
2025-12-04T10:35:21.1014723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1015405Z     self._precompile_worker()
2025-12-04T10:35:21.1016158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1016942Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1017713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1018506Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1019230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1019993Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1020741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1021563Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1022204Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1022830Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1023354Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1023680Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1023999Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1024253Z     x0 = xindex
2025-12-04T10:35:21.1024477Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1024784Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1025042Z            ^
2025-12-04T10:35:21.1025492Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1025947Z 
2025-12-04T10:35:21.1026558Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1027290Z 
2025-12-04T10:35:21.1027361Z 
2025-12-04T10:35:21.1027545Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1028517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1029303Z 
2025-12-04T10:35:21.1029532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1030074Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1030472Z frames [('total', 1)]
2025-12-04T10:35:21.1030780Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1031352Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1032058Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1032445Z graph_break []
2025-12-04T10:35:21.1032750Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1033143Z frames [('total', 1)]
2025-12-04T10:35:21.1033387Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1033823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1034727Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1035498Z graph_break []
2025-12-04T10:35:21.1035752Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1036259Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1036751Z Traceback (most recent call last):
2025-12-04T10:35:21.1037270Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1037820Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1038438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1039173Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1039934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1040639Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1041347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1042022Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1042710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1043548Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1044392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1045076Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1045741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1046397Z     return self._compile_to_module()
2025-12-04T10:35:21.1047005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1047671Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1048452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1049117Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1049756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1050545Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1051353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1052077Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1052716Z   File "/tmp/tmpmxeh4jqx/67/c676vt6qpogoiequijtsoelb2wziv53uqvl24wrwfsormxxhr24i.py", line 51, in <module>
2025-12-04T10:35:21.1053668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1054262Z     kernel.precompile(
2025-12-04T10:35:21.1054785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1054884Z     self._precompile_worker()
2025-12-04T10:35:21.1055397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1055563Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1056138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1056317Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1056700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1057556Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1057941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1058231Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1058432Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1058711Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1058817Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1058940Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1059084Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1059163Z     x0 = xindex
2025-12-04T10:35:21.1059274Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1059369Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1059443Z            ^
2025-12-04T10:35:21.1059776Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1059781Z 
2025-12-04T10:35:21.1060391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1060396Z 
2025-12-04T10:35:21.1060400Z 
2025-12-04T10:35:21.1060592Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1061272Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1061277Z 
2025-12-04T10:35:21.1061516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1061700Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1061790Z frames [('total', 1)]
2025-12-04T10:35:21.1061898Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1062297Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1062483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1062572Z graph_break []
2025-12-04T10:35:21.1062756Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1062846Z frames [('total', 1)]
2025-12-04T10:35:21.1063001Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1063186Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1063587Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1063670Z graph_break []
2025-12-04T10:35:21.1063846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1063941Z frames [('total', 1)]
2025-12-04T10:35:21.1064032Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1064213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1064656Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1064736Z graph_break []
2025-12-04T10:35:21.1065301Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml -
2025-12-04T10:35:21.1065444Z =========================== short test summary info ============================
2025-12-04T10:35:21.1066138Z FAILED [0.3286s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1066412Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1066555Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1066680Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1066766Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1066845Z     x0 = xindex
2025-12-04T10:35:21.1066949Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1067046Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1067117Z            ^
2025-12-04T10:35:21.1067448Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1067453Z 
2025-12-04T10:35:21.1068062Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1068069Z 
2025-12-04T10:35:21.1068073Z 
2025-12-04T10:35:21.1068264Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1068931Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1068938Z 
2025-12-04T10:35:21.1069174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1069330Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1069499Z ================== 1 failed, 63 deselected, 2 rerun in 2.46s ===================
2025-12-04T10:35:21.1069591Z Got exit code 1
2025-12-04T10:35:21.1069681Z Retrying single test...
2025-12-04T10:35:21.1070085Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml
2025-12-04T10:35:21.1070228Z ============================= test session starts ==============================
2025-12-04T10:35:21.1070526Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1070624Z cachedir: .pytest_cache
2025-12-04T10:35:21.1071076Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1071177Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1071273Z configfile: pytest.ini
2025-12-04T10:35:21.1071735Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1071925Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.1072579Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1072676Z Running 1 items in this shard
2025-12-04T10:35:21.1072683Z 
2025-12-04T10:35:21.1073623Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1074270Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1074783Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1075258Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1075679Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1076089Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1076545Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1077029Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1077456Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1077883Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1078320Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1078780Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1079251Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1079552Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1081092Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1081554Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1082282Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1082713Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1083419Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1084072Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1084794Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1085226Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1085943Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1086521Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1087257Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1087989Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1088705Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1089333Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1090046Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1090623Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1091374Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1091678Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1092249Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1092551Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1093004Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1093899Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1094427Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1095176Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1095753Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1096539Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1097197Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1097714Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1098355Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1098881Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1099411Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1099828Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1100231Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1100695Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1101176Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1101523Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1102226Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1102332Z ('RERUN', {'yellow': True}) [1.7859s] [100%]
2025-12-04T10:35:21.1103268Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1103906Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1104366Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1104838Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1105250Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1105612Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1106067Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1106507Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1106935Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1107362Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1108009Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1108584Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1109058Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1109357Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1110894Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1111408Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1112191Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1112623Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1113384Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1113985Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1114705Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1115136Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1115846Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1116384Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1117121Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1117818Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1118533Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1119122Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1119838Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1120539Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1121288Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1121590Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1122166Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1122511Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1122961Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1123843Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1124416Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1125162Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1125779Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1126527Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1127179Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1127701Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1128349Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1128807Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1129275Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1129693Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1130054Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1130513Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1130953Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1131295Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1131996Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1132101Z ('RERUN', {'yellow': True}) [0.3278s] [100%]
2025-12-04T10:35:21.1133076Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1133715Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1134172Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1134685Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1135097Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1135467Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1135980Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1136453Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1136917Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1137344Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1137768Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1138226Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1138686Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1138981Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1140654Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1141115Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1141842Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1142270Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1142971Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1143571Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1144368Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1144799Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1145515Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1146101Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1146872Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1147568Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1148319Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1148905Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1149657Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1150237Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1150988Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1151291Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1151863Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1152166Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1152614Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1153495Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1154027Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1154774Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1155351Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1156096Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1156793Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1157314Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1157951Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1158411Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1158922Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1159344Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1159702Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1160208Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1160645Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1161029Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1161724Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1161810Z FAILED [0.3262s] [100%]
2025-12-04T10:35:21.1161815Z 
2025-12-04T10:35:21.1161937Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1162205Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1162305Z Traceback (most recent call last):
2025-12-04T10:35:21.1162626Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1162736Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1163146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1163364Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1163797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1163959Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1164390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1164508Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1164962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1165235Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1165691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1165835Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1166261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1166362Z     return self._compile_to_module()
2025-12-04T10:35:21.1166773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1166904Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1167393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1167502Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1167927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1168121Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1168619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1168768Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1169173Z   File "/tmp/tmpfd_whw1o/4l/c4lv633hnzo7unmp7eoiv3266ff7xyvpnmehefvqxrsess6rycdf.py", line 51, in <module>
2025-12-04T10:35:21.1169572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1169663Z     kernel.precompile(
2025-12-04T10:35:21.1170131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1170276Z     self._precompile_worker()
2025-12-04T10:35:21.1170781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1170969Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1171474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1171639Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1172017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1172218Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1172590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1172874Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1173068Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1173334Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1173438Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1173549Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1173635Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1173712Z     x0 = xindex
2025-12-04T10:35:21.1173809Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1173904Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1173972Z            ^
2025-12-04T10:35:21.1174303Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1174310Z 
2025-12-04T10:35:21.1174919Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1174928Z 
2025-12-04T10:35:21.1174932Z 
2025-12-04T10:35:21.1175116Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1175795Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1175803Z 
2025-12-04T10:35:21.1176028Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1176212Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1176299Z frames [('total', 1)]
2025-12-04T10:35:21.1176390Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1176838Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1177024Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1177100Z graph_break []
2025-12-04T10:35:21.1177371Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1177469Z Traceback (most recent call last):
2025-12-04T10:35:21.1177787Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1177889Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1178303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1178554Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1178988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1179209Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1179639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1179809Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1180269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1180607Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1181045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1181171Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1181580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1181681Z     return self._compile_to_module()
2025-12-04T10:35:21.1182090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1182226Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1182665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1182770Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1183203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1183399Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1183893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1184003Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1184446Z   File "/tmp/tmpncyqbmyv/ip/cip6fbymc5rcgxth5iwbypyzci4ugmmybsjbyrepen7saybti77z.py", line 51, in <module>
2025-12-04T10:35:21.1184836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1184930Z     kernel.precompile(
2025-12-04T10:35:21.1185400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1185496Z     self._precompile_worker()
2025-12-04T10:35:21.1186000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1186148Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1186652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1186815Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1187251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1187455Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1187833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1188117Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1188318Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1188583Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1188727Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1188843Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1188930Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1189003Z     x0 = xindex
2025-12-04T10:35:21.1189101Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1189203Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1189275Z            ^
2025-12-04T10:35:21.1189602Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1189648Z 
2025-12-04T10:35:21.1190263Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1190306Z 
2025-12-04T10:35:21.1190310Z 
2025-12-04T10:35:21.1190490Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1191172Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1191179Z 
2025-12-04T10:35:21.1191402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1191588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1191671Z frames [('total', 1)]
2025-12-04T10:35:21.1191766Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1192170Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1192354Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1192438Z graph_break []
2025-12-04T10:35:21.1192616Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1192696Z frames [('total', 1)]
2025-12-04T10:35:21.1192790Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1192972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1193363Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1193447Z graph_break []
2025-12-04T10:35:21.1193567Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1193829Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1193933Z Traceback (most recent call last):
2025-12-04T10:35:21.1194243Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1194347Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1194762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1194969Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1195406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1195565Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1196054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1196190Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1196668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1196941Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1197383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1197503Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1197953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1198051Z     return self._compile_to_module()
2025-12-04T10:35:21.1198461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1198606Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1199044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1199199Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1199617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1199873Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1200370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1200478Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1200901Z   File "/tmp/tmp6y97_p_5/cn/ccnytioj4573pvampbf3surjtlxtfpnbqbakpi2at3gdi7stnwrg.py", line 51, in <module>
2025-12-04T10:35:21.1201291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1201377Z     kernel.precompile(
2025-12-04T10:35:21.1201856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1201955Z     self._precompile_worker()
2025-12-04T10:35:21.1202458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1202613Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1203114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1203285Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1203666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1203871Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1204250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1204531Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1204727Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1204990Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1205091Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1205210Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1205299Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1205372Z     x0 = xindex
2025-12-04T10:35:21.1205476Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1205570Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1205638Z            ^
2025-12-04T10:35:21.1206012Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1206017Z 
2025-12-04T10:35:21.1206624Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1206630Z 
2025-12-04T10:35:21.1206634Z 
2025-12-04T10:35:21.1206819Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1207496Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1207541Z 
2025-12-04T10:35:21.1207996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1208202Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1208286Z frames [('total', 1)]
2025-12-04T10:35:21.1208379Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1208784Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1209056Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1209139Z graph_break []
2025-12-04T10:35:21.1209315Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1209396Z frames [('total', 1)]
2025-12-04T10:35:21.1209544Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1209724Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1210125Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1210202Z graph_break []
2025-12-04T10:35:21.1210376Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1210459Z frames [('total', 1)]
2025-12-04T10:35:21.1210548Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1210737Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1211128Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1211206Z graph_break []
2025-12-04T10:35:21.1211767Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml -
2025-12-04T10:35:21.1211910Z =========================== short test summary info ============================
2025-12-04T10:35:21.1212561Z FAILED [0.3262s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1212837Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1212937Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1213056Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1213142Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1213212Z     x0 = xindex
2025-12-04T10:35:21.1213312Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1213406Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1213475Z            ^
2025-12-04T10:35:21.1213804Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1213812Z 
2025-12-04T10:35:21.1214414Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1214421Z 
2025-12-04T10:35:21.1214425Z 
2025-12-04T10:35:21.1214608Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1215349Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1215356Z 
2025-12-04T10:35:21.1215581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1215757Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1215946Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ==================
2025-12-04T10:35:21.1216025Z Got exit code 1
2025-12-04T10:35:21.1216111Z Retrying single test...
2025-12-04T10:35:21.1216511Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml
2025-12-04T10:35:21.1216707Z ============================= test session starts ==============================
2025-12-04T10:35:21.1216998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1217086Z cachedir: .pytest_cache
2025-12-04T10:35:21.1217531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1217636Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1217727Z configfile: pytest.ini
2025-12-04T10:35:21.1218248Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1218436Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.1219126Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1219221Z Running 1 items in this shard
2025-12-04T10:35:21.1219225Z 
2025-12-04T10:35:21.1220163Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1220805Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1221268Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1221736Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1222155Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1222517Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1222972Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1223420Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1223845Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1228655Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1229129Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1229606Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1230084Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1230462Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1232026Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1232488Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1233264Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1233705Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1234459Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1235072Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1235844Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1236286Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1237009Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1237554Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1238302Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1239004Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1239733Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1240337Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1241066Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1241657Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1242418Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1242776Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1243357Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1243673Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1244128Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1245025Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1245603Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1246363Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1246989Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1247743Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1248451Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1248977Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1249631Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1250099Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1250575Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1251006Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1251374Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1251844Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1252293Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1252648Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1253355Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1253471Z ('RERUN', {'yellow': True}) [1.7878s] [100%]
2025-12-04T10:35:21.1254415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1255105Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1255582Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1256059Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1256480Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1256852Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1257357Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1257807Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1258235Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1258704Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1259212Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1259726Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1260197Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1260507Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1262054Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1262526Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1263262Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1263702Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1264414Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1265028Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1265758Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1266254Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1267013Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1267562Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1268308Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1269009Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1269804Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1270401Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1271170Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1271755Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1272555Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1272868Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1273448Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1273760Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1274209Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1275098Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1275636Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1276392Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1276975Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1277724Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1278389Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1278918Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1279614Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1280082Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1280557Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1280983Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1281347Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1281859Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1282314Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1282670Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1283419Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1283569Z ('RERUN', {'yellow': True}) [0.3297s] [100%]
2025-12-04T10:35:21.1284515Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1285163Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1285633Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1286113Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1286533Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1286907Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1287364Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1287821Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1288254Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1288692Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1289129Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1289591Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1290064Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1290371Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1291952Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1292417Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1293154Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1293642Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1294355Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1295003Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1295752Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1296341Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1297063Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1297607Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1298359Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1299108Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1299842Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1300440Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1301174Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1301758Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1302513Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1302826Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1303402Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1303757Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1304216Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1305103Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1305648Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1306500Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1307086Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1308164Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1308841Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1309425Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1310073Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1310541Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1311022Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1311444Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1311812Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1312286Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1312729Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1313080Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1313790Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1313881Z FAILED [0.3283s] [100%]
2025-12-04T10:35:21.1313886Z 
2025-12-04T10:35:21.1314025Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1314311Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1314416Z Traceback (most recent call last):
2025-12-04T10:35:21.1314744Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1314861Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1315288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1315612Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1316061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1316240Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1316678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1316810Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1317279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1317616Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1318066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1318198Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1318611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1318764Z     return self._compile_to_module()
2025-12-04T10:35:21.1319179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1319366Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1319827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1319949Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1320377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1320581Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1321093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1321219Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1321659Z   File "/tmp/tmpbxy29ftn/ja/cjafqmc53vipi3ljoyzauf6nmdu7kws535ie5ara563rhupojgd2.py", line 51, in <module>
2025-12-04T10:35:21.1322066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1322165Z     kernel.precompile(
2025-12-04T10:35:21.1322639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1322747Z     self._precompile_worker()
2025-12-04T10:35:21.1323257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1323411Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1323935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1324110Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1324503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1324715Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1325095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1325395Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
﻿2025-12-04T10:35:21.1328254Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1328608Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1328757Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1328956Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1329058Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1329138Z     x0 = xindex
2025-12-04T10:35:21.1329241Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1329345Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1329420Z            ^
2025-12-04T10:35:21.1329749Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1329758Z 
2025-12-04T10:35:21.1330373Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1330379Z 
2025-12-04T10:35:21.1330404Z 
2025-12-04T10:35:21.1330587Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1331275Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1331280Z 
2025-12-04T10:35:21.1331511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1331740Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1331841Z frames [('total', 1)]
2025-12-04T10:35:21.1331936Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1332388Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1332579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1332660Z graph_break []
2025-12-04T10:35:21.1332933Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1333041Z Traceback (most recent call last):
2025-12-04T10:35:21.1333355Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1333467Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1333882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1334100Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1334540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1334704Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1335147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1335267Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1335724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1336007Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1336448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1336575Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1336981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1337085Z     return self._compile_to_module()
2025-12-04T10:35:21.1337497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1337636Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1338087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1338285Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1338755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1338955Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1339521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1339630Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1340059Z   File "/tmp/tmprxf_h57j/ma/cma4karkspr5tydudgs6yi6ovkwhsh4sfaofr7da5a74lbo5nhku.py", line 51, in <module>
2025-12-04T10:35:21.1340455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1340552Z     kernel.precompile(
2025-12-04T10:35:21.1341027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1341122Z     self._precompile_worker()
2025-12-04T10:35:21.1341635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1341892Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1342489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1342654Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1343072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1343283Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1343654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1343939Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1344144Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1344414Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1344520Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1344637Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1344725Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1344806Z     x0 = xindex
2025-12-04T10:35:21.1344907Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1345005Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1345088Z            ^
2025-12-04T10:35:21.1345418Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1345423Z 
2025-12-04T10:35:21.1346038Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1346045Z 
2025-12-04T10:35:21.1346049Z 
2025-12-04T10:35:21.1346232Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1346910Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1346923Z 
2025-12-04T10:35:21.1347147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1347336Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1347429Z frames [('total', 1)]
2025-12-04T10:35:21.1347521Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1347920Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1348184Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1348266Z graph_break []
2025-12-04T10:35:21.1348492Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1348572Z frames [('total', 1)]
2025-12-04T10:35:21.1348663Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1348856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1349250Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1349329Z graph_break []
2025-12-04T10:35:21.1349458Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1349730Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1349831Z Traceback (most recent call last):
2025-12-04T10:35:21.1350150Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1350258Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1350685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1350895Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1351371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1351545Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1351979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1352172Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1352626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1352901Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1353352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1353476Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1353886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1353996Z     return self._compile_to_module()
2025-12-04T10:35:21.1354406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1354555Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1354992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1355098Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1355526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1355748Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1356283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1356387Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1356823Z   File "/tmp/tmpvuf9ihr7/so/csolmffcaq42dab5nfjqsymkpfghw2wnevyext463tbcu65pheq6.py", line 51, in <module>
2025-12-04T10:35:21.1357223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1357319Z     kernel.precompile(
2025-12-04T10:35:21.1357791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1357942Z     self._precompile_worker()
2025-12-04T10:35:21.1358451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1358647Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1359156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1359325Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1359715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1359925Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1360304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1360590Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1360788Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1361065Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1361169Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1361285Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1361385Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1361501Z     x0 = xindex
2025-12-04T10:35:21.1361609Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1361705Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1361775Z            ^
2025-12-04T10:35:21.1362155Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1362160Z 
2025-12-04T10:35:21.1362766Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1362773Z 
2025-12-04T10:35:21.1362777Z 
2025-12-04T10:35:21.1362965Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1363640Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1363645Z 
2025-12-04T10:35:21.1363875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1364059Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1364145Z frames [('total', 1)]
2025-12-04T10:35:21.1364244Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1364641Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1364828Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1364915Z graph_break []
2025-12-04T10:35:21.1365092Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1365177Z frames [('total', 1)]
2025-12-04T10:35:21.1365275Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1365459Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1365858Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1365940Z graph_break []
2025-12-04T10:35:21.1366119Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1366209Z frames [('total', 1)]
2025-12-04T10:35:21.1366308Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1366489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1366885Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1367016Z graph_break []
2025-12-04T10:35:21.1367572Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml -
2025-12-04T10:35:21.1367764Z =========================== short test summary info ============================
2025-12-04T10:35:21.1368421Z FAILED [0.3283s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1368694Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1368795Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1368911Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1369007Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1369078Z     x0 = xindex
2025-12-04T10:35:21.1369177Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1369277Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1369350Z            ^
2025-12-04T10:35:21.1369682Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1369687Z 
2025-12-04T10:35:21.1370298Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1370303Z 
2025-12-04T10:35:21.1370306Z 
2025-12-04T10:35:21.1370539Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1371212Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1371256Z 
2025-12-04T10:35:21.1371481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1371639Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1371811Z ================== 1 failed, 187 deselected, 2 rerun in 2.48s ==================
2025-12-04T10:35:21.1371893Z Got exit code 1
2025-12-04T10:35:21.1372363Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32
2025-12-04T10:35:21.1372713Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:21.1373121Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml
2025-12-04T10:35:21.1373257Z ============================= test session starts ==============================
2025-12-04T10:35:21.1373552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1373649Z cachedir: .pytest_cache
2025-12-04T10:35:21.1374093Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1374199Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1374284Z configfile: pytest.ini
2025-12-04T10:35:21.1374746Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1374937Z collecting ... collected 188 items / 64 deselected / 124 selected
2025-12-04T10:35:21.1375052Z stepcurrent: skipping 64 already run items.
2025-12-04T10:35:21.1375145Z Running 124 items in this shard
2025-12-04T10:35:21.1375149Z 
2025-12-04T10:35:21.1376156Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1376799Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1377311Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1377822Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1378244Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1378604Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1379122Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1379568Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1379992Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1380426Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1380849Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1381349Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1381814Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1382162Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1383707Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1384163Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1384898Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1385325Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1386030Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1386643Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1387364Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1387794Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1388506Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1389092Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1389863Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1390562Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1391281Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1391868Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1392586Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1393228Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1393984Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1394323Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1394896Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1395199Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1395649Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1396595Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1397125Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1397879Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1398453Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1399197Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1399853Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1400372Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1401012Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1401522Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1402042Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1402457Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1402813Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1403271Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1403714Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1404067Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1404762Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1404869Z ('RERUN', {'yellow': True}) [1.7767s] [  0%]
2025-12-04T10:35:21.1405865Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1406541Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1406998Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1407475Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1408135Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1408516Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1408971Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1409416Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1409840Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1410272Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1410695Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1411157Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1411617Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1411915Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1413542Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1414066Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1414798Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1415222Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1415921Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1416602Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1417395Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1417827Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1418595Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1419191Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1419935Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1420630Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1421345Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1421937Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1422653Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1423234Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1423992Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1424291Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1424865Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1425168Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1425670Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1426645Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1427180Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1427943Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1428519Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1429274Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1429965Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1430483Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1431170Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1431630Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1432108Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1432523Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1432882Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1433341Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1433783Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1434129Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1434824Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1434933Z ('RERUN', {'yellow': True}) [0.3321s] [  0%]
2025-12-04T10:35:21.1435883Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1436520Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1436984Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1437502Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1437985Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1438345Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1438801Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1439241Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1439666Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1440101Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1440530Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1440994Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1441494Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1441792Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1443370Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1443831Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1444567Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1444991Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1445695Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1446294Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1447014Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1447444Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1448155Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1448695Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1449474Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1450202Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1450921Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1451508Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1452220Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1452801Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1453596Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1453894Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1454505Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1454805Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1455256Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1456183Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1456726Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1457477Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1458054Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1458796Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1459568Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1460095Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1460736Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1461198Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1461724Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1462177Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1462538Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1462999Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1463437Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1463790Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1464486Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1464570Z FAILED [0.3291s] [  0%]
2025-12-04T10:35:21.1464575Z 
2025-12-04T10:35:21.1464702Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1464977Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1465118Z Traceback (most recent call last):
2025-12-04T10:35:21.1465426Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1465528Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1466011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1466245Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1466680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1466843Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1467274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1467395Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1467845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1468116Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1468564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1468684Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1469092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1469189Z     return self._compile_to_module()
2025-12-04T10:35:21.1469598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1469736Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1470171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1470279Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1470702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1470900Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1471404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1471555Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1471995Z   File "/tmp/tmpohe9brhe/jv/cjvmuecqquthpzzvzvd4jlivnlyxyg64sgspv7borqwcgjjmcwau.py", line 51, in <module>
2025-12-04T10:35:21.1472433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1472520Z     kernel.precompile(
2025-12-04T10:35:21.1472997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1473097Z     self._precompile_worker()
2025-12-04T10:35:21.1473600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1473752Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1474254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1474419Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1474801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1475011Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1475387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1475792Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1475989Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1476298Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1476396Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1476514Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1476600Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1476675Z     x0 = xindex
2025-12-04T10:35:21.1476774Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1476865Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1476934Z            ^
2025-12-04T10:35:21.1477272Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1477277Z 
2025-12-04T10:35:21.1477891Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1477896Z 
2025-12-04T10:35:21.1477900Z 
2025-12-04T10:35:21.1478082Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1478770Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1478775Z 
2025-12-04T10:35:21.1478998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1479190Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1479269Z frames [('total', 1)]
2025-12-04T10:35:21.1479364Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1479761Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1479948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1480033Z graph_break []
2025-12-04T10:35:21.1480307Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1480409Z Traceback (most recent call last):
2025-12-04T10:35:21.1480721Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1480821Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1481235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1481525Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1481998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1482160Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1482592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1482713Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1483167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1483439Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1483881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1484004Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1484410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1484513Z     return self._compile_to_module()
2025-12-04T10:35:21.1484960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1485099Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1485534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1485681Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1486107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1486297Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1486797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1486901Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1487337Z   File "/tmp/tmphekkkgde/vg/cvghy7lbglwjfqtrnzdaes42a2pji3o5xqbzwdatfrd3jf2mhnii.py", line 51, in <module>
2025-12-04T10:35:21.1487733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1487822Z     kernel.precompile(
2025-12-04T10:35:21.1488293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1488396Z     self._precompile_worker()
2025-12-04T10:35:21.1488900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1489060Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1489562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1489729Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1490109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1490313Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1490687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1490970Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1491161Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1491430Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1491581Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1491696Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1491785Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1491898Z     x0 = xindex
2025-12-04T10:35:21.1492002Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1492101Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1492173Z            ^
2025-12-04T10:35:21.1492511Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1492516Z 
2025-12-04T10:35:21.1493122Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1493129Z 
2025-12-04T10:35:21.1493132Z 
2025-12-04T10:35:21.1493319Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1494012Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1494019Z 
2025-12-04T10:35:21.1494243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1494425Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1494513Z frames [('total', 1)]
2025-12-04T10:35:21.1494649Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1495053Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1495279Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1495364Z graph_break []
2025-12-04T10:35:21.1495542Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1495625Z frames [('total', 1)]
2025-12-04T10:35:21.1495719Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1495901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1496299Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1496380Z graph_break []
2025-12-04T10:35:21.1496501Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1496783Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1496883Z Traceback (most recent call last):
2025-12-04T10:35:21.1497198Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1497312Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1497722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1497926Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1498365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1498526Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1498961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1499128Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1499581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1499856Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1500297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1500418Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1500878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1500975Z     return self._compile_to_module()
2025-12-04T10:35:21.1501429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1501564Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1502004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1502114Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1502534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1502727Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1503222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1503327Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1503764Z   File "/tmp/tmpsujxd4mo/2h/c2hmwuahhhp3slo3a6xbqzk2k6tjrqeutk346ynmsnxwrrha7lyn.py", line 51, in <module>
2025-12-04T10:35:21.1504155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1504246Z     kernel.precompile(
2025-12-04T10:35:21.1504757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1504851Z     self._precompile_worker()
2025-12-04T10:35:21.1505402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1505548Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1506055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1506224Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1506602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1506807Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1507183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1507468Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1507667Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1508102Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1508207Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1508319Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1508406Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1508481Z     x0 = xindex
2025-12-04T10:35:21.1508579Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1508673Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1508749Z            ^
2025-12-04T10:35:21.1509075Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1509080Z 
2025-12-04T10:35:21.1509693Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1509701Z 
2025-12-04T10:35:21.1509705Z 
2025-12-04T10:35:21.1509889Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1510573Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1510665Z 
2025-12-04T10:35:21.1510893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1511070Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1511211Z frames [('total', 1)]
2025-12-04T10:35:21.1511304Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1511706Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1511892Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1511971Z graph_break []
2025-12-04T10:35:21.1512148Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1512231Z frames [('total', 1)]
2025-12-04T10:35:21.1512320Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1512501Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1512893Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1512973Z graph_break []
2025-12-04T10:35:21.1513153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1513232Z frames [('total', 1)]
2025-12-04T10:35:21.1513321Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1513567Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1513958Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1514041Z graph_break []
2025-12-04T10:35:21.1514678Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml -
2025-12-04T10:35:21.1514818Z =========================== short test summary info ============================
2025-12-04T10:35:21.1515493Z FAILED [0.3291s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1515765Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1515870Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1515983Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1516069Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1516143Z     x0 = xindex
2025-12-04T10:35:21.1516239Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1521085Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1521185Z            ^
2025-12-04T10:35:21.1521529Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1521535Z 
2025-12-04T10:35:21.1522145Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1522162Z 
2025-12-04T10:35:21.1522166Z 
2025-12-04T10:35:21.1522354Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1523052Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1523057Z 
2025-12-04T10:35:21.1523292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1523447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1523623Z ================== 1 failed, 64 deselected, 2 rerun in 2.47s ===================
2025-12-04T10:35:21.1523710Z Got exit code 1
2025-12-04T10:35:21.1523803Z Retrying single test...
2025-12-04T10:35:21.1524212Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml
2025-12-04T10:35:21.1524422Z ============================= test session starts ==============================
2025-12-04T10:35:21.1524724Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1524865Z cachedir: .pytest_cache
2025-12-04T10:35:21.1525322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1525440Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1525537Z configfile: pytest.ini
2025-12-04T10:35:21.1526000Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1526198Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.1526819Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1526922Z Running 1 items in this shard
2025-12-04T10:35:21.1526931Z 
2025-12-04T10:35:21.1527889Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1528574Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1529042Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1529551Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1529970Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1530331Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1530794Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1531242Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1531668Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1532114Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1532538Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1532998Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1533460Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1533759Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1535310Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1535815Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1536599Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1537030Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1537748Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1538352Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1539133Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1539570Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1540325Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1540865Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1541636Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1542341Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1543053Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1543640Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1544358Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1544938Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1545703Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1546053Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1546629Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1546924Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1547376Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1548272Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1548886Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1549645Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1550216Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1550967Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1551623Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1552143Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1552829Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1553289Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1553810Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1554230Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1554593Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1555053Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1555497Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1555865Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1556597Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1556705Z ('RERUN', {'yellow': True}) [1.7907s] [100%]
2025-12-04T10:35:21.1557660Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1558303Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1558763Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1559234Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1559665Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1560024Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1560523Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1561007Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1561434Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1561866Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1562290Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1562747Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1563216Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1563518Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1565126Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1565619Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1566405Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1566836Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1567540Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1568144Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1568863Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1569295Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1570012Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1570554Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1571289Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1571987Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1572784Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1573374Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1574088Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1574668Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1575430Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1575737Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1576360Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1576659Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1577106Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1578038Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1578571Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1579407Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1579980Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1580726Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1581384Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1581906Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1582556Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1583016Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1583492Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1583906Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1584315Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1584820Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1585265Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1585628Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1586324Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1586437Z ('RERUN', {'yellow': True}) [0.3317s] [100%]
2025-12-04T10:35:21.1587396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1588039Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1588544Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1589018Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1589476Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1589838Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1590293Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1590741Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1591162Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1591596Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1592023Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1592482Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1592944Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1593243Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1594791Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1595252Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1596037Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1596548Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1597252Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1597858Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1598583Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1599013Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1599729Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1600311Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1601048Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1601778Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1602503Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1603094Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1603815Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1604394Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1605153Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1605458Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1606060Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1606391Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1606838Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1607726Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1608424Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1609338Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1609915Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1610658Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1611329Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1611851Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1612494Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1613016Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1613498Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1613967Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1614326Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1614789Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1615229Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1615581Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1616333Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1616420Z FAILED [0.3315s] [100%]
2025-12-04T10:35:21.1616424Z 
2025-12-04T10:35:21.1616547Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1616823Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1616936Z Traceback (most recent call last):
2025-12-04T10:35:21.1617249Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1617353Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1617776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1617991Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1618428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1618596Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1619076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1619205Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1619663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1619990Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1620487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1620610Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1621022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1621124Z     return self._compile_to_module()
2025-12-04T10:35:21.1621535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1621675Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1622113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1622229Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1622646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1622849Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1623543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1623695Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1624249Z   File "/tmp/tmpjib6zihx/vn/cvncu4cmth7bwqhqbuxaqrwjz5bymnvrdyzawn76e3doqn5z6lf3.py", line 51, in <module>
2025-12-04T10:35:21.1624747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1624841Z     kernel.precompile(
2025-12-04T10:35:21.1625324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1625430Z     self._precompile_worker()
2025-12-04T10:35:21.1625987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1626140Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1626646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1626821Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1627208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1627419Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1627798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1628082Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1628277Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1628551Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1628654Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1628773Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1628863Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1628941Z     x0 = xindex
2025-12-04T10:35:21.1629048Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1629142Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1629219Z            ^
2025-12-04T10:35:21.1629550Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1629556Z 
2025-12-04T10:35:21.1630164Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1630221Z 
2025-12-04T10:35:21.1630225Z 
2025-12-04T10:35:21.1630415Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1631145Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1631153Z 
2025-12-04T10:35:21.1631382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1631565Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1631655Z frames [('total', 1)]
2025-12-04T10:35:21.1631751Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1632155Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1632341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1632430Z graph_break []
2025-12-04T10:35:21.1632705Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1632808Z Traceback (most recent call last):
2025-12-04T10:35:21.1633127Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1633229Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1633691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1633900Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1634375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1634541Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1634971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1635101Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1635554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1635850Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1636325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1636446Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1636856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1636959Z     return self._compile_to_module()
2025-12-04T10:35:21.1637467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1637614Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1638051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1638163Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1638594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1638788Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1639293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1639399Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1639835Z   File "/tmp/tmp1piy4oy2/yj/cyjntxgpam3ussp25tu34tk337frgpermhbqgb4umn3fipyxqin3.py", line 51, in <module>
2025-12-04T10:35:21.1640231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1640367Z     kernel.precompile(
2025-12-04T10:35:21.1640886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1640986Z     self._precompile_worker()
2025-12-04T10:35:21.1641492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1641645Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1642148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1642317Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1642699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1642908Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1643292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1643576Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1643772Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1644092Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1644195Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1644311Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1644451Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1644528Z     x0 = xindex
2025-12-04T10:35:21.1644634Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1644732Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1644802Z            ^
2025-12-04T10:35:21.1645135Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1645140Z 
2025-12-04T10:35:21.1645752Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1645757Z 
2025-12-04T10:35:21.1645761Z 
2025-12-04T10:35:21.1645952Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1646641Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1646649Z 
2025-12-04T10:35:21.1646871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1647058Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1647143Z frames [('total', 1)]
2025-12-04T10:35:21.1647247Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1647652Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1647840Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1647923Z graph_break []
2025-12-04T10:35:21.1648101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1648186Z frames [('total', 1)]
2025-12-04T10:35:21.1648283Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1648463Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1648862Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1648951Z graph_break []
2025-12-04T10:35:21.1649074Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1649350Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1649525Z Traceback (most recent call last):
2025-12-04T10:35:21.1649841Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1650076Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1650495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1650711Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1651163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1651327Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1651767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1651887Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1652345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1652625Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1653069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1653244Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1653655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1653800Z     return self._compile_to_module()
2025-12-04T10:35:21.1654217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1654355Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1654800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1654916Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1655340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1655540Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1656089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1656193Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1656647Z   File "/tmp/tmpm20ydfom/jx/cjxc5kdxyvsf2tfwnsber2e3zuomavoytzyrnjwv6gqktkcc32iz.py", line 51, in <module>
2025-12-04T10:35:21.1657041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1657135Z     kernel.precompile(
2025-12-04T10:35:21.1657610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1657704Z     self._precompile_worker()
2025-12-04T10:35:21.1658216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1658368Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1658878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1659102Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1659487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1659697Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1660067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1660399Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1660643Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1660912Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1661023Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1661136Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1661227Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1661308Z     x0 = xindex
2025-12-04T10:35:21.1661409Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1661507Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1661593Z            ^
2025-12-04T10:35:21.1661923Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1661928Z 
2025-12-04T10:35:21.1662547Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1662551Z 
2025-12-04T10:35:21.1662555Z 
2025-12-04T10:35:21.1662739Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1663479Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1663490Z 
2025-12-04T10:35:21.1663717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1663939Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1664031Z frames [('total', 1)]
2025-12-04T10:35:21.1664130Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1664528Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1664718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1664796Z graph_break []
2025-12-04T10:35:21.1664981Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1665061Z frames [('total', 1)]
2025-12-04T10:35:21.1665152Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1665339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1665734Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1665814Z graph_break []
2025-12-04T10:35:21.1665997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1666078Z frames [('total', 1)]
2025-12-04T10:35:21.1666171Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1666358Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1666750Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1666832Z graph_break []
2025-12-04T10:35:21.1667387Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml -
2025-12-04T10:35:21.1667528Z =========================== short test summary info ============================
2025-12-04T10:35:21.1668205Z FAILED [0.3315s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1668478Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1668578Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1668689Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1668779Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1668906Z     x0 = xindex
2025-12-04T10:35:21.1669008Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1669100Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1669175Z            ^
2025-12-04T10:35:21.1669544Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1669549Z 
2025-12-04T10:35:21.1670169Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1670174Z 
2025-12-04T10:35:21.1670178Z 
2025-12-04T10:35:21.1670357Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1671042Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1671046Z 
2025-12-04T10:35:21.1671276Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1671425Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1671594Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ==================
2025-12-04T10:35:21.1671671Z Got exit code 1
2025-12-04T10:35:21.1671758Z Retrying single test...
2025-12-04T10:35:21.1672205Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml
2025-12-04T10:35:21.1672336Z ============================= test session starts ==============================
2025-12-04T10:35:21.1672628Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1672761Z cachedir: .pytest_cache
2025-12-04T10:35:21.1673204Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1673307Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1673398Z configfile: pytest.ini
2025-12-04T10:35:21.1673856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1674046Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.1674658Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1674753Z Running 1 items in this shard
2025-12-04T10:35:21.1674758Z 
2025-12-04T10:35:21.1675709Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1676401Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1676871Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1677339Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1677757Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1678114Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1678571Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1679017Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1679490Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1679956Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1680379Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1680843Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1681302Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1681597Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1683182Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1683633Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1684430Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1684852Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1685563Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1686163Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1686882Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1687310Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1688021Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1688563Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1689295Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1689988Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1690699Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1691286Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1692097Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1692678Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1693430Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1693728Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1694302Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1694600Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1695048Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1695973Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1696542Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1697291Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1697867Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1698615Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1699323Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1699848Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1700488Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1700947Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1701419Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1701832Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1702194Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1702650Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1703089Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1703491Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1704228Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1704344Z ('RERUN', {'yellow': True}) [1.7981s] [100%]
2025-12-04T10:35:21.1705292Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1705930Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1706393Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1706863Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1707319Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1707677Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1708289Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1708814Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1709236Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1709668Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1710092Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1710555Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1711012Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1711310Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1712851Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1713307Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1714038Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1714462Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1715171Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1715919Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1716664Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1717089Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1717803Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1718346Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1719079Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1719829Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1720542Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1721171Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1721890Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1722468Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1723221Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1723520Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1724096Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1724394Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1724842Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1725730Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1726262Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1727020Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1727642Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1728436Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1729094Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1729613Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1730256Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1730712Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1731188Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1731667Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1732027Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1732487Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1732967Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1733313Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1734013Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1734122Z ('RERUN', {'yellow': True}) [0.3298s] [100%]
2025-12-04T10:35:21.1735069Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1735706Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1736213Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1736684Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1737100Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1737458Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1737915Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1738362Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1738784Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float32)
2025-12-04T10:35:21.1739307Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp3 = tmp0.to(tl.float8e5)
2025-12-04T10:35:21.1739770Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp4 = tmp3.to(tl.float32)
2025-12-04T10:35:21.1740229Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (x0), tmp2, xmask)
2025-12-04T10:35:21.1740691Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr1 + (x0), tmp4, xmask)
2025-12-04T10:35:21.1740989Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1742534Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1742985Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1743754Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1744213Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1744916Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to
2025-12-04T10:35:21.1745516Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic)
2025-12-04T10:35:21.1746235Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper
2025-12-04T10:35:21.1746662Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return fn(*args, **kwargs)
2025-12-04T10:35:21.1747371Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast
2025-12-04T10:35:21.1747913Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return _semantic.cast(input, dtype, fp_downcast_rounding)
2025-12-04T10:35:21.1748648Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast
2025-12-04T10:35:21.1749422Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty)
2025-12-04T10:35:21.1750133Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir
2025-12-04T10:35:21.1750721Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape)
2025-12-04T10:35:21.1751433Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir
2025-12-04T10:35:21.1752099Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     raise ValueError(f'type {self} not supported in this architecture. '
2025-12-04T10:35:21.1752852Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1753149Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1753725Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception:
2025-12-04T10:35:21.1754020Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1754469Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1755355Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1755927Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1756680Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1757290Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1758040Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1758693Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1759209Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11:
2025-12-04T10:35:21.1759857Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1760313Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1760790Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1761209Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = xindex < xnumel
2025-12-04T10:35:21.1761569Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     x0 = xindex
2025-12-04T10:35:21.1762025Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1762466Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1762812Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]            ^
2025-12-04T10:35:21.1763548Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1763674Z FAILED [0.3281s] [100%]
2025-12-04T10:35:21.1763682Z 
2025-12-04T10:35:21.1763800Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1764077Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1764185Z Traceback (most recent call last):
2025-12-04T10:35:21.1764492Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1764597Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1765013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1765219Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1765660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1765819Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1766251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1766418Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1766868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1767136Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1767630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1767753Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1768161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1768258Z     return self._compile_to_module()
2025-12-04T10:35:21.1768668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1768809Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1769248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1769357Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1769774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1769970Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1770469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1770575Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1771010Z   File "/tmp/tmpe3k2j0xr/rx/crxsbidmuj75eeq4quiyndijtubzbohtghod3g2vcmjhcaiv5e3i.py", line 51, in <module>
2025-12-04T10:35:21.1771406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1771495Z     kernel.precompile(
2025-12-04T10:35:21.1771967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1772059Z     self._precompile_worker()
2025-12-04T10:35:21.1772568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1772717Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1773219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1773437Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1773878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1774084Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1774459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1774740Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1774938Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1775203Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1775299Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1775413Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1775502Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1775578Z     x0 = xindex
2025-12-04T10:35:21.1775678Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1775772Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1775847Z            ^
2025-12-04T10:35:21.1776177Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1776182Z 
2025-12-04T10:35:21.1776837Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1776843Z 
2025-12-04T10:35:21.1776885Z 
2025-12-04T10:35:21.1777069Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1777753Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1777760Z 
2025-12-04T10:35:21.1777984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1778163Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1778247Z frames [('total', 1)]
2025-12-04T10:35:21.1778343Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1778743Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1778925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1779011Z graph_break []
2025-12-04T10:35:21.1779335Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1779439Z Traceback (most recent call last):
2025-12-04T10:35:21.1779753Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1779852Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1780267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1780471Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1780907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1781073Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1781502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1781622Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1782074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1782341Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1782833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1782952Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1783402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1783507Z     return self._compile_to_module()
2025-12-04T10:35:21.1783919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1784059Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1784496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1784605Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1785023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1785221Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1785723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1785825Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1786282Z   File "/tmp/tmpiq1_n3h8/ye/cyelgf577euwxblutrz6ac4dsqsrig56dzg54ilfdqpuceroxu5d.py", line 51, in <module>
2025-12-04T10:35:21.1786675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1786762Z     kernel.precompile(
2025-12-04T10:35:21.1787272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1787366Z     self._precompile_worker()
2025-12-04T10:35:21.1787874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1788025Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1788531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1788699Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1789081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1789281Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1789654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1789935Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1790132Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1790402Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1790504Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1790618Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1790707Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1790779Z     x0 = xindex
2025-12-04T10:35:21.1790877Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1790973Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1791047Z            ^
2025-12-04T10:35:21.1791378Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1791383Z 
2025-12-04T10:35:21.1791990Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1791995Z 
2025-12-04T10:35:21.1792000Z 
2025-12-04T10:35:21.1792183Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1792912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1792918Z 
2025-12-04T10:35:21.1793183Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1793366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1793451Z frames [('total', 1)]
2025-12-04T10:35:21.1793547Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1793944Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1794130Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1794212Z graph_break []
2025-12-04T10:35:21.1794386Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1794470Z frames [('total', 1)]
2025-12-04T10:35:21.1794567Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1794747Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1795143Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1795221Z graph_break []
2025-12-04T10:35:21.1795379Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1795656Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _
2025-12-04T10:35:21.1795754Z Traceback (most recent call last):
2025-12-04T10:35:21.1796159Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast
2025-12-04T10:35:21.1796264Z     y0_fp8, y1_fp8 = compiled_fp8_cast(x)
2025-12-04T10:35:21.1796675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1796889Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1797321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1797483Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1797921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1798042Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1798494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1798771Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1799209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1799332Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1799736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1799834Z     return self._compile_to_module()
2025-12-04T10:35:21.1800252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1800387Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1800827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1800932Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1801351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1801548Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1802044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1802197Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1802670Z   File "/tmp/tmpakkjjst6/i2/ci2x3fb3ijnofyls2b5v3dzbvgt6g7uziphmuwdohkhqj2y6hre3.py", line 51, in <module>
2025-12-04T10:35:21.1803064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1803159Z     kernel.precompile(
2025-12-04T10:35:21.1803628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1803726Z     self._precompile_worker()
2025-12-04T10:35:21.1804234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1804379Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1804885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1805048Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1805425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1805678Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1806101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1806382Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1806617Z torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1806881Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1806985Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1807099Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1807185Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1807263Z     x0 = xindex
2025-12-04T10:35:21.1807364Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1807459Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1807536Z            ^
2025-12-04T10:35:21.1808169Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1808176Z 
2025-12-04T10:35:21.1808785Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1808793Z 
2025-12-04T10:35:21.1808797Z 
2025-12-04T10:35:21.1808974Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1809666Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1813363Z 
2025-12-04T10:35:21.1813611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1813797Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1813888Z frames [('total', 1)]
2025-12-04T10:35:21.1813984Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1814385Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1814578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1814659Z graph_break []
2025-12-04T10:35:21.1814842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1814928Z frames [('total', 1)]
2025-12-04T10:35:21.1815022Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1815212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1815733Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1815813Z graph_break []
2025-12-04T10:35:21.1816060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1816147Z frames [('total', 1)]
2025-12-04T10:35:21.1816239Z stats [('calls_captured', 4)]
2025-12-04T10:35:21.1816432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1816822Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1816917Z graph_break []
2025-12-04T10:35:21.1817480Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml -
2025-12-04T10:35:21.1817624Z =========================== short test summary info ============================
2025-12-04T10:35:21.1818311Z FAILED [0.3281s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11:
2025-12-04T10:35:21.1818585Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1818696Z     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1818870Z     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1818959Z     xmask = xindex < xnumel
2025-12-04T10:35:21.1819100Z     x0 = xindex
2025-12-04T10:35:21.1819202Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2025-12-04T10:35:21.1819361Z     tmp1 = tmp0.to(tl.float8e4nv)
2025-12-04T10:35:21.1819442Z            ^
2025-12-04T10:35:21.1819776Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')
2025-12-04T10:35:21.1819781Z 
2025-12-04T10:35:21.1820395Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1820403Z 
2025-12-04T10:35:21.1820406Z 
2025-12-04T10:35:21.1820590Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1821279Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1821291Z 
2025-12-04T10:35:21.1821516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1821666Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1821842Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ==================
2025-12-04T10:35:21.1821928Z Got exit code 1
2025-12-04T10:35:21.1822402Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32
2025-12-04T10:35:21.1822760Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:21.1823160Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml
2025-12-04T10:35:21.1823301Z ============================= test session starts ==============================
2025-12-04T10:35:21.1823596Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1823688Z cachedir: .pytest_cache
2025-12-04T10:35:21.1824139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1824243Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1824330Z configfile: pytest.ini
2025-12-04T10:35:21.1824793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1825111Z collecting ... collected 188 items / 65 deselected / 123 selected
2025-12-04T10:35:21.1825236Z stepcurrent: skipping 65 already run items.
2025-12-04T10:35:21.1825331Z Running 123 items in this shard
2025-12-04T10:35:21.1825336Z 
2025-12-04T10:35:21.1826279Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1826894Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1827253Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.1827712Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1828188Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1828663Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.1829152Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.1829615Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.1830104Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.1830715Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.1831028Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1832521Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1832982Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1833869Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1834406Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1835167Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1835748Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1836550Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1837203Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1837806Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.1838419Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1838721Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.1839488Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1839599Z ('RERUN', {'yellow': True}) [1.3200s] [  0%]
2025-12-04T10:35:21.1840493Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1841105Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1841506Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.1841970Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1842485Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1842963Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.1843406Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.1843871Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.1844311Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.1844920Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.1845233Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1846769Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1847231Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1848114Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1848653Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1849449Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1850056Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1850810Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1851468Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1851993Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.1852604Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1852915Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.1853713Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1853824Z ('RERUN', {'yellow': True}) [0.2131s] [  0%]
2025-12-04T10:35:21.1854780Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1855387Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1855756Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.1856211Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1856694Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1857169Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.1857610Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.1858077Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.1858516Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.1859192Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.1859494Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1860975Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1861530Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1862417Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1862953Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1863708Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1864296Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1865047Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1865757Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1866273Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.1866926Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1867239Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.1868003Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1868097Z FAILED [0.2105s] [  0%]
2025-12-04T10:35:21.1868101Z 
2025-12-04T10:35:21.1868223Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1868467Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.1868577Z Traceback (most recent call last):
2025-12-04T10:35:21.1868947Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.1869054Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.1869466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1869676Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1870121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1870284Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1870719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1870848Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1871300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1871580Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1872020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1872200Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1872613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1872755Z     return self._compile_to_module()
2025-12-04T10:35:21.1873171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1873309Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1873749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1873867Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1874283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1874476Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1874988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1875093Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1875536Z   File "/tmp/tmpgol57jy2/w3/cw3nv2awzzn4lic4te2mw6rkulsdb7isc3oaearntin6mzockj6z.py", line 45, in <module>
2025-12-04T10:35:21.1875996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1876099Z     kernel.precompile(
2025-12-04T10:35:21.1876590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1876729Z     self._precompile_worker()
2025-12-04T10:35:21.1877243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1877395Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1877905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1878080Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1878459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1878670Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1879042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1879327Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1879523Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.1879759Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1879834Z ^
2025-12-04T10:35:21.1880234Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1880239Z 
2025-12-04T10:35:21.1880848Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1880853Z 
2025-12-04T10:35:21.1880857Z 
2025-12-04T10:35:21.1881044Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1881656Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1881664Z 
2025-12-04T10:35:21.1881892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1882070Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1882162Z frames [('total', 1)]
2025-12-04T10:35:21.1882311Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1882514Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1882741Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1882827Z graph_break []
2025-12-04T10:35:21.1883069Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.1883180Z Traceback (most recent call last):
2025-12-04T10:35:21.1883552Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.1883649Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.1884072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1884284Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1884718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1884890Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1885322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1885453Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1885949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1886263Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1886762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1886883Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1887306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1887410Z     return self._compile_to_module()
2025-12-04T10:35:21.1887816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1887965Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1888404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1888515Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1888946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1889147Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1889648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1889755Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1890195Z   File "/tmp/tmpbbu2cx9k/vo/cvo6hyaq4qnyygkgomzus5xpmcnywbqcr3zph3u3yc3xx4d7y4vt.py", line 45, in <module>
2025-12-04T10:35:21.1890601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1890693Z     kernel.precompile(
2025-12-04T10:35:21.1891186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1891287Z     self._precompile_worker()
2025-12-04T10:35:21.1891791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1891949Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1892457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1892622Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1893052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1893292Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1893675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1893957Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1894148Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.1894389Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1894462Z ^
2025-12-04T10:35:21.1894850Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1894867Z 
2025-12-04T10:35:21.1895479Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1895486Z 
2025-12-04T10:35:21.1895490Z 
2025-12-04T10:35:21.1895680Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1896422Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1896427Z 
2025-12-04T10:35:21.1896654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1896839Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1896966Z frames [('total', 1)]
2025-12-04T10:35:21.1897058Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1897272Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1897458Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1897537Z graph_break []
2025-12-04T10:35:21.1897724Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1897807Z frames [('total', 1)]
2025-12-04T10:35:21.1897907Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1898090Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1898294Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1898382Z graph_break []
2025-12-04T10:35:21.1898499Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1898739Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.1898848Z Traceback (most recent call last):
2025-12-04T10:35:21.1899292Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.1899396Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.1899811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1900017Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1900465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1900633Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1901067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1901192Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1901645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1901920Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1902407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1902528Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1902983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1903082Z     return self._compile_to_module()
2025-12-04T10:35:21.1903499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1903635Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1904073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1904184Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1904600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1904795Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1905306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1905411Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1905873Z   File "/tmp/tmpprd8y_q3/7q/c7qgjtpfsa6thos3owiuex6xlfrxs4bj7vygfeco5elq246qfywl.py", line 45, in <module>
2025-12-04T10:35:21.1906264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1906353Z     kernel.precompile(
2025-12-04T10:35:21.1906866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1906960Z     self._precompile_worker()
2025-12-04T10:35:21.1907471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1907620Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1908358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1908591Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1909100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1909368Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1909841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1910130Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1910333Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.1910570Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1910639Z ^
2025-12-04T10:35:21.1911033Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1911040Z 
2025-12-04T10:35:21.1911647Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1911652Z 
2025-12-04T10:35:21.1911656Z 
2025-12-04T10:35:21.1911842Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1912454Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1912461Z 
2025-12-04T10:35:21.1912703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1912881Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1913061Z frames [('total', 1)]
2025-12-04T10:35:21.1913163Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1913421Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1913608Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1913690Z graph_break []
2025-12-04T10:35:21.1913869Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1913953Z frames [('total', 1)]
2025-12-04T10:35:21.1914053Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1914234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1914439Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1914518Z graph_break []
2025-12-04T10:35:21.1914696Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1914787Z frames [('total', 1)]
2025-12-04T10:35:21.1914879Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1915062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1915264Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1915342Z graph_break []
2025-12-04T10:35:21.1916007Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml -
2025-12-04T10:35:21.1916148Z =========================== short test summary info ============================
2025-12-04T10:35:21.1916761Z FAILED [0.2105s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.1917078Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1917148Z ^
2025-12-04T10:35:21.1917534Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1917548Z 
2025-12-04T10:35:21.1918156Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1918160Z 
2025-12-04T10:35:21.1918164Z 
2025-12-04T10:35:21.1918348Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1918971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1918975Z 
2025-12-04T10:35:21.1919202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1919364Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.1919529Z ================== 1 failed, 65 deselected, 2 rerun in 1.78s ===================
2025-12-04T10:35:21.1919614Z Got exit code 1
2025-12-04T10:35:21.1919707Z Retrying single test...
2025-12-04T10:35:21.1920109Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml
2025-12-04T10:35:21.1920243Z ============================= test session starts ==============================
2025-12-04T10:35:21.1920543Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.1920633Z cachedir: .pytest_cache
2025-12-04T10:35:21.1921088Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.1921194Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.1921282Z configfile: pytest.ini
2025-12-04T10:35:21.1921747Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.1921938Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.1922534Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1922721Z Running 1 items in this shard
2025-12-04T10:35:21.1923273Z 
2025-12-04T10:35:21.1924178Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1924796Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1925156Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.1925620Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1926094Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1926568Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.1927060Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.1927525Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.1928011Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.1928622Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.1928925Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1930420Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1930878Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1931768Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1932305Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1933072Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1933650Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1934406Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1935107Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1935692Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.1936311Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1936619Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.1937384Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1937494Z ('RERUN', {'yellow': True}) [1.3255s] [100%]
2025-12-04T10:35:21.1938390Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1939128Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1939489Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.1939950Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1940468Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1940944Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.1941385Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.1941845Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.1942299Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.1942913Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.1943221Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1944700Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1945163Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1946051Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1946587Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1947429Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1948011Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1948761Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1949422Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1949944Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.1950559Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1950864Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.1951661Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1951811Z ('RERUN', {'yellow': True}) [0.2116s] [100%]
2025-12-04T10:35:21.1952693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.1953300Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1953663Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.1954126Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.1954594Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.1955075Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.1955514Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.1956006Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.1956480Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.1957091Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.1957389Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.1958865Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.1959402Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.1960290Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1960822Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1961577Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1962158Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1962906Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1963601Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1964154Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.1964765Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1965075Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.1965836Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1965921Z FAILED [0.2096s] [100%]
2025-12-04T10:35:21.1965925Z 
2025-12-04T10:35:21.1966053Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.1966340Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.1966445Z Traceback (most recent call last):
2025-12-04T10:35:21.1966811Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.1966908Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.1967321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1967535Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1967976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1968139Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1968571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1968694Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1969148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1969431Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1969868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1970037Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1970489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1970587Z     return self._compile_to_module()
2025-12-04T10:35:21.1970996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1971132Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1971576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1971685Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1972104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1972296Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1972798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1972904Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1973335Z   File "/tmp/tmpf546t5lc/gz/cgzmfmc5lfvvjm43n3swwe5bcztxho6q3brd2phy2wzl6xfzx5jb.py", line 45, in <module>
2025-12-04T10:35:21.1973773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1973861Z     kernel.precompile(
2025-12-04T10:35:21.1974335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1974472Z     self._precompile_worker()
2025-12-04T10:35:21.1974976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1975127Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1975629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1975796Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1976173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1976378Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1976760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1977042Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1977233Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.1977465Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1977534Z ^
2025-12-04T10:35:21.1977922Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1977927Z 
2025-12-04T10:35:21.1978531Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1978536Z 
2025-12-04T10:35:21.1978544Z 
2025-12-04T10:35:21.1978729Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1979426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1979434Z 
2025-12-04T10:35:21.1979656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1979838Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1979989Z frames [('total', 1)]
2025-12-04T10:35:21.1980088Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1980285Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1980508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1980590Z graph_break []
2025-12-04T10:35:21.1980830Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.1980931Z Traceback (most recent call last):
2025-12-04T10:35:21.1981299Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.1981394Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.1981809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1982015Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1982451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1982615Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1983048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1983165Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1983667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1983938Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.1984418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.1984537Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.1984941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.1985043Z     return self._compile_to_module()
2025-12-04T10:35:21.1985452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.1985587Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.1986075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.1986184Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.1986603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.1986796Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.1987295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.1987400Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.1987831Z   File "/tmp/tmps4gi5qmf/ac/cac4hnrw52ut4pur633pmxiwbw6zo36sgiefykofqi45pif2xrtj.py", line 45, in <module>
2025-12-04T10:35:21.1988226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.1988313Z     kernel.precompile(
2025-12-04T10:35:21.1988783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.1988883Z     self._precompile_worker()
2025-12-04T10:35:21.1989386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.1989536Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.1990038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.1990247Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.1990628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.1990867Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.1991248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.1991528Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.1991719Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.1991963Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.1992035Z ^
2025-12-04T10:35:21.1992419Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.1992424Z 
2025-12-04T10:35:21.1993039Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.1993043Z 
2025-12-04T10:35:21.1993047Z 
2025-12-04T10:35:21.1993228Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.1993881Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.1993887Z 
2025-12-04T10:35:21.1994110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.1994417Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1994501Z frames [('total', 1)]
2025-12-04T10:35:21.1994594Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1994800Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1995000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1995082Z graph_break []
2025-12-04T10:35:21.1995275Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.1995362Z frames [('total', 1)]
2025-12-04T10:35:21.1995457Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.1995653Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.1995861Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.1995945Z graph_break []
2025-12-04T10:35:21.1996071Z =================================== FAILURES ===================================
2025-12-04T10:35:21.1996330Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.1996441Z Traceback (most recent call last):
2025-12-04T10:35:21.1996827Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.1996928Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.1997368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.1997588Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.1998061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.1998230Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.1998690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.1998825Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.1999306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.1999597Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.2000112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.2000239Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.2000718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.2000822Z     return self._compile_to_module()
2025-12-04T10:35:21.2001262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.2001413Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.2001881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.2001996Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.2002442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.2002648Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.2003183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.2003294Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.2003804Z   File "/tmp/tmpgu6qsbdx/hs/chszo5w7z5wf3c6rogt4jfxjsd4uvjjy3reehjdpvyf3mli2ider.py", line 45, in <module>
2025-12-04T10:35:21.2004196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.2004324Z     kernel.precompile(
2025-12-04T10:35:21.2004801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.2004895Z     self._precompile_worker()
2025-12-04T10:35:21.2005403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.2005553Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.2006105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2006272Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2006650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2006850Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2007235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2007514Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2007705Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.2008186Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2008255Z ^
2025-12-04T10:35:21.2008649Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2008654Z 
2025-12-04T10:35:21.2009262Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.2009267Z 
2025-12-04T10:35:21.2009271Z 
2025-12-04T10:35:21.2009452Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.2010063Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2010068Z 
2025-12-04T10:35:21.2010294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.2010550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2010633Z frames [('total', 1)]
2025-12-04T10:35:21.2010730Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2010987Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2011172Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2011256Z graph_break []
2025-12-04T10:35:21.2011432Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2011516Z frames [('total', 1)]
2025-12-04T10:35:21.2011608Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2011790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2011990Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2012076Z graph_break []
2025-12-04T10:35:21.2012251Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2012339Z frames [('total', 1)]
2025-12-04T10:35:21.2012430Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2012607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2012810Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2012885Z graph_break []
2025-12-04T10:35:21.2013499Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml -
2025-12-04T10:35:21.2013648Z =========================== short test summary info ============================
2025-12-04T10:35:21.2014253Z FAILED [0.2096s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.2014572Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2014641Z ^
2025-12-04T10:35:21.2015027Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2015034Z 
2025-12-04T10:35:21.2015641Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.2015646Z 
2025-12-04T10:35:21.2015649Z 
2025-12-04T10:35:21.2015835Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.2016495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2016503Z 
2025-12-04T10:35:21.2016724Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.2016873Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.2017043Z ================== 1 failed, 187 deselected, 2 rerun in 1.78s ==================
2025-12-04T10:35:21.2017126Z Got exit code 1
2025-12-04T10:35:21.2017219Z Retrying single test...
2025-12-04T10:35:21.2017692Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml
2025-12-04T10:35:21.2017827Z ============================= test session starts ==============================
2025-12-04T10:35:21.2018127Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.2018213Z cachedir: .pytest_cache
2025-12-04T10:35:21.2018659Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.2018761Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.2018847Z configfile: pytest.ini
2025-12-04T10:35:21.2019410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.2019647Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T10:35:21.2020187Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2020322Z Running 1 items in this shard
2025-12-04T10:35:21.2020327Z 
2025-12-04T10:35:21.2021222Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.2021835Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2022200Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.2022659Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.2023134Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.2023608Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.2024088Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.2024549Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.2025033Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.2025638Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.2025964Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.2027485Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.2027936Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.2028823Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2029356Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2030113Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2030683Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2031435Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2032125Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2032677Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.2033288Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2033590Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.2034352Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2034460Z ('RERUN', {'yellow': True}) [1.3388s] [100%]
2025-12-04T10:35:21.2035346Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.2036033Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2036401Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.2036897Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.2037369Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.2037848Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.2038284Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.2038743Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.2039182Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.2039786Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.2040087Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.2041572Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.2042029Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.2042912Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2043442Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2044307Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2044882Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2045633Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2046286Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2046804Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.2047422Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2047725Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.2048531Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2048679Z ('RERUN', {'yellow': True}) [0.2112s] [100%]
2025-12-04T10:35:21.2049561Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0
2025-12-04T10:35:21.2050168Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2050527Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xnumel = 1
2025-12-04T10:35:21.2050984Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xoffset = tl.program_id(0) * XBLOCK
2025-12-04T10:35:21.2051452Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
2025-12-04T10:35:21.2051933Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     xmask = tl.full([XBLOCK], True, tl.int1)[:]
2025-12-04T10:35:21.2052370Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp0 = tl.load(in_ptr0 + (0))
2025-12-04T10:35:21.2052834Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp1 = tl.broadcast_to(tmp0, [XBLOCK])
2025-12-04T10:35:21.2053274Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tmp2 = tmp1.to(tl.float8e4nv)
2025-12-04T10:35:21.2053883Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None)
2025-12-04T10:35:21.2054183Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 
2025-12-04T10:35:21.2055661Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86}
2025-12-04T10:35:21.2056252Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last):
2025-12-04T10:35:21.2057135Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2057671Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2058426Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2059010Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2059813Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2060533Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2061088Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0:
2025-12-04T10:35:21.2061692Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2062001Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^
2025-12-04T10:35:21.2062757Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2062838Z FAILED [0.2084s] [100%]
2025-12-04T10:35:21.2062849Z 
2025-12-04T10:35:21.2062969Z ==================================== RERUNS ====================================
2025-12-04T10:35:21.2063213Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.2063321Z Traceback (most recent call last):
2025-12-04T10:35:21.2063685Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.2063778Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.2064193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.2064408Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.2064852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.2065016Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.2065447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.2065570Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.2066063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.2066353Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.2066790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.2066960Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.2067410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.2067507Z     return self._compile_to_module()
2025-12-04T10:35:21.2067915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.2068054Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.2068489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.2068599Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.2069013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.2069207Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.2069710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.2069815Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.2070257Z   File "/tmp/tmpfh6xc2uv/h7/ch7xssf6cphasno2hppyyj53qk7amp5apj44vgbcnbvtyemftxuu.py", line 45, in <module>
2025-12-04T10:35:21.2070688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.2070777Z     kernel.precompile(
2025-12-04T10:35:21.2071251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.2071387Z     self._precompile_worker()
2025-12-04T10:35:21.2071890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.2072042Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.2072548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2072717Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2073098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2073298Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2073672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2073955Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2074150Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.2074381Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2074451Z ^
2025-12-04T10:35:21.2074842Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2074847Z 
2025-12-04T10:35:21.2075452Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.2075459Z 
2025-12-04T10:35:21.2075463Z 
2025-12-04T10:35:21.2075649Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.2076259Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2076266Z 
2025-12-04T10:35:21.2076486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.2076664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2076792Z frames [('total', 1)]
2025-12-04T10:35:21.2076887Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2077084Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2077307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2077396Z graph_break []
2025-12-04T10:35:21.2077637Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.2077734Z Traceback (most recent call last):
2025-12-04T10:35:21.2078102Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.2078204Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.2078615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.2078825Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.2079260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.2079421Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.2079852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.2079972Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.2080465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.2080735Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.2081218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.2081335Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.2085394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.2085524Z     return self._compile_to_module()
2025-12-04T10:35:21.2085959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.2086118Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.2086598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.2086709Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.2087134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.2087331Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.2087829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.2087944Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.2088365Z   File "/tmp/tmpv_xfm0hy/cj/ccjvvismqyeuh7hlgw7eqkeh2ngfoksrt4m6l6ndzvgiwwykpz3g.py", line 45, in <module>
2025-12-04T10:35:21.2088763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.2088853Z     kernel.precompile(
2025-12-04T10:35:21.2089332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.2089437Z     self._precompile_worker()
2025-12-04T10:35:21.2089948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.2090097Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.2090615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2090845Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2091234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2091487Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2091865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2092156Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2092347Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.2092594Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2092663Z ^
2025-12-04T10:35:21.2093055Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2093063Z 
2025-12-04T10:35:21.2093673Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.2093678Z 
2025-12-04T10:35:21.2093684Z 
2025-12-04T10:35:21.2093865Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.2094521Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2094527Z 
2025-12-04T10:35:21.2094753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.2094977Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2095075Z frames [('total', 1)]
2025-12-04T10:35:21.2095170Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2095371Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2095557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2095635Z graph_break []
2025-12-04T10:35:21.2095841Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2095934Z frames [('total', 1)]
2025-12-04T10:35:21.2096044Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2096230Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2096427Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2096510Z graph_break []
2025-12-04T10:35:21.2096628Z =================================== FAILURES ===================================
2025-12-04T10:35:21.2096871Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________
2025-12-04T10:35:21.2096972Z Traceback (most recent call last):
2025-12-04T10:35:21.2097342Z   File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel
2025-12-04T10:35:21.2097442Z     actual = torch.compile(f)(x)
2025-12-04T10:35:21.2097862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:35:21.2098070Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:35:21.2098508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:35:21.2098670Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:35:21.2099221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:35:21.2099352Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:35:21.2099802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:35:21.2100071Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:35:21.2100565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
2025-12-04T10:35:21.2100759Z     compiled_module = graph.compile_to_module()
2025-12-04T10:35:21.2101170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module
2025-12-04T10:35:21.2101270Z     return self._compile_to_module()
2025-12-04T10:35:21.2101678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module
2025-12-04T10:35:21.2101817Z     mod = self._compile_to_module_lines(wrapper_code)
2025-12-04T10:35:21.2102252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines
2025-12-04T10:35:21.2102363Z     mod = PyCodeCache.load_by_key_path(
2025-12-04T10:35:21.2102778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path
2025-12-04T10:35:21.2102973Z     mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
2025-12-04T10:35:21.2103477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
2025-12-04T10:35:21.2103580Z     exec(code, mod.__dict__, mod.__dict__)
2025-12-04T10:35:21.2104049Z   File "/tmp/tmpr7h3zpm5/24/c24xvtrwerswq67ic42j5v2wniug2ddaizenwff44lvozr44wsmm.py", line 45, in <module>
2025-12-04T10:35:21.2104447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton
2025-12-04T10:35:21.2104575Z     kernel.precompile(
2025-12-04T10:35:21.2105050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile
2025-12-04T10:35:21.2105143Z     self._precompile_worker()
2025-12-04T10:35:21.2105647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker
2025-12-04T10:35:21.2105801Z     compile_results.append(self._precompile_config(c))
2025-12-04T10:35:21.2106307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config
2025-12-04T10:35:21.2106475Z     binary = triton.compile(*compile_args, **compile_kwargs)
2025-12-04T10:35:21.2106852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile
2025-12-04T10:35:21.2107056Z     module = src.make_ir(target, options, codegen_fns, module_map, context)
2025-12-04T10:35:21.2107436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir
2025-12-04T10:35:21.2107716Z     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
2025-12-04T10:35:21.2108069Z torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.2108314Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2108384Z ^
2025-12-04T10:35:21.2108781Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2108786Z 
2025-12-04T10:35:21.2109391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.2109396Z 
2025-12-04T10:35:21.2109400Z 
2025-12-04T10:35:21.2109589Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.2110205Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2110210Z 
2025-12-04T10:35:21.2110434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.2110698Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2110782Z frames [('total', 1)]
2025-12-04T10:35:21.2110878Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2111137Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2111325Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2111412Z graph_break []
2025-12-04T10:35:21.2111587Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2111670Z frames [('total', 1)]
2025-12-04T10:35:21.2111768Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2111950Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2112145Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2112230Z graph_break []
2025-12-04T10:35:21.2112403Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:35:21.2112495Z frames [('total', 1)]
2025-12-04T10:35:21.2112586Z stats [('calls_captured', 1)]
2025-12-04T10:35:21.2112768Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:35:21.2112966Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:35:21.2113050Z graph_break []
2025-12-04T10:35:21.2113665Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml -
2025-12-04T10:35:21.2113814Z =========================== short test summary info ============================
2025-12-04T10:35:21.2114479Z FAILED [0.2084s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0:
2025-12-04T10:35:21.2114720Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2025-12-04T10:35:21.2114792Z ^
2025-12-04T10:35:21.2115189Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
2025-12-04T10:35:21.2115193Z 
2025-12-04T10:35:21.2115808Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:35:21.2115813Z 
2025-12-04T10:35:21.2115817Z 
2025-12-04T10:35:21.2116001Z To execute this test, run the following from the base repo dir:
2025-12-04T10:35:21.2116613Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2116620Z 
2025-12-04T10:35:21.2116843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:35:21.2116992Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:35:21.2117165Z ================== 1 failed, 187 deselected, 2 rerun in 1.79s ==================
2025-12-04T10:35:21.2117246Z Got exit code 1
2025-12-04T10:35:21.2117655Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda
2025-12-04T10:35:21.2118004Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:35:21.2118402Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml
2025-12-04T10:35:21.2118547Z ============================= test session starts ==============================
2025-12-04T10:35:21.2118839Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:35:21.2118930Z cachedir: .pytest_cache
2025-12-04T10:35:21.2119383Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:35:21.2119487Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:35:21.2119631Z configfile: pytest.ini
2025-12-04T10:35:21.2120090Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:35:21.2120326Z collecting ... collected 188 items / 66 deselected / 122 selected
2025-12-04T10:35:21.2120454Z stepcurrent: skipping 66 already run items.
2025-12-04T10:35:21.2120548Z Running 122 items in this shard
2025-12-04T10:35:21.2120555Z 
2025-12-04T10:35:21.2120928Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [1.4014s] [  0%]
2025-12-04T10:35:21.2121695Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  1%]
2025-12-04T10:35:21.2122455Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  2%]
2025-12-04T10:35:21.2123213Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  3%]
2025-12-04T10:35:21.2124012Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  4%]
2025-12-04T10:35:21.2124782Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  4%]
2025-12-04T10:35:21.2125579Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  5%]
2025-12-04T10:35:21.2126387Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  6%]
2025-12-04T10:35:21.2127142Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  7%]
2025-12-04T10:35:21.2127576Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0002s] (Not supported on non B200) [  8%]
2025-12-04T10:35:21.2128111Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  9%]
2025-12-04T10:35:21.2128930Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [  9%]
2025-12-04T10:35:21.2129759Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 10%]
2025-12-04T10:35:21.2130561Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 11%]
2025-12-04T10:35:21.2131374Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 12%]
2025-12-04T10:35:21.2132171Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%]
2025-12-04T10:35:21.2133068Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%]
2025-12-04T10:35:21.2133865Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 14%]
2025-12-04T10:35:21.2134675Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 15%]
2025-12-04T10:35:21.2135462Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 16%]
2025-12-04T10:35:21.2136309Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 17%]
2025-12-04T10:35:21.2137140Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%]
2025-12-04T10:35:21.2137931Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%]
2025-12-04T10:35:21.2138785Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 19%]
2025-12-04T10:35:21.2139647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 20%]
2025-12-04T10:35:21.2140444Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 21%]
2025-12-04T10:35:21.2141246Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%]
2025-12-04T10:35:21.2142051Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%]
2025-12-04T10:35:21.2142854Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 23%]
2025-12-04T10:35:21.2143655Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 24%]
2025-12-04T10:35:21.2144464Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 25%]
2025-12-04T10:35:21.2145257Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 26%]
2025-12-04T10:35:21.2146169Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%]
2025-12-04T10:35:21.2146955Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%]
2025-12-04T10:35:21.2147751Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 28%]
2025-12-04T10:35:21.2148551Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 29%]
2025-12-04T10:35:21.2149366Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 30%]
2025-12-04T10:35:21.2150188Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%]
2025-12-04T10:35:21.2150986Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%]
2025-12-04T10:35:21.2151806Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 32%]
2025-12-04T10:35:21.2152600Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 33%]
2025-12-04T10:35:21.2153477Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 34%]
2025-12-04T10:35:21.2154334Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 35%]
2025-12-04T10:35:21.2155193Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%]
2025-12-04T10:35:21.2156046Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%]
2025-12-04T10:35:21.2156896Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 37%]
2025-12-04T10:35:21.2157724Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 38%]
2025-12-04T10:35:21.2158560Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 39%]
2025-12-04T10:35:21.2159464Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%]
2025-12-04T10:35:21.2160308Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%]
2025-12-04T10:35:21.2161139Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 41%]
2025-12-04T10:35:21.2161971Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 42%]
2025-12-04T10:35:21.2162809Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 43%]
2025-12-04T10:35:21.2163608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 44%]
2025-12-04T10:35:21.2164371Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%]
2025-12-04T10:35:21.2165139Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%]
2025-12-04T10:35:21.2165875Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 46%]
2025-12-04T10:35:21.2166489Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 47%]
2025-12-04T10:35:21.2167321Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 48%]
2025-12-04T10:35:21.2168161Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 49%]
2025-12-04T10:35:21.2168980Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%]
2025-12-04T10:35:21.2169813Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%]
2025-12-04T10:35:21.2170624Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 51%]
2025-12-04T10:35:21.2171452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 52%]
2025-12-04T10:35:21.2172346Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 53%]
2025-12-04T10:35:21.2173183Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%]
2025-12-04T10:35:21.2173985Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%]
2025-12-04T10:35:21.2174804Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 55%]
2025-12-04T10:35:21.2175610Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 56%]
2025-12-04T10:35:21.2176552Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 57%]
2025-12-04T10:35:21.2177371Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 58%]
2025-12-04T10:35:21.2178241Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%]
2025-12-04T10:35:21.2179116Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%]
2025-12-04T10:35:21.2179935Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 60%]
2025-12-04T10:35:21.2180745Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 61%]
2025-12-04T10:35:21.2181563Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 62%]
2025-12-04T10:35:21.2182388Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%]
2025-12-04T10:35:21.2183211Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%]
2025-12-04T10:35:21.2184024Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 64%]
2025-12-04T10:35:21.2184843Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 65%]
2025-12-04T10:35:21.2185737Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 66%]
2025-12-04T10:35:21.2186558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 67%]
2025-12-04T10:35:21.2187370Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%]
2025-12-04T10:35:21.2188204Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%]
2025-12-04T10:35:21.2189012Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 69%]
2025-12-04T10:35:21.2189865Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 70%]
2025-12-04T10:35:21.2190661Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 71%]
2025-12-04T10:35:21.2191546Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%]
2025-12-04T10:35:21.2192513Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%]
2025-12-04T10:35:21.2193480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 73%]
2025-12-04T10:35:21.2194428Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 74%]
2025-12-04T10:35:21.2195384Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 75%]
2025-12-04T10:35:21.2196365Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 76%]
2025-12-04T10:35:21.2197293Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%]
2025-12-04T10:35:21.2198217Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%]
2025-12-04T10:35:21.2199214Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1397s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 78%]
2025-12-04T10:35:21.2200148Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 79%]
2025-12-04T10:35:21.2201071Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 80%]
2025-12-04T10:35:21.2202005Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%]
2025-12-04T10:35:21.2202928Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%]
2025-12-04T10:35:21.2203917Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 82%]
2025-12-04T10:35:21.2204900Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 83%]
2025-12-04T10:35:21.2205851Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 84%]
2025-12-04T10:35:21.2206840Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 85%]
2025-12-04T10:35:21.2207900Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%]
2025-12-04T10:35:21.2208817Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%]
2025-12-04T10:35:21.2209744Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 87%]
2025-12-04T10:35:21.2210650Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 88%]
2025-12-04T10:35:21.2211572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 89%]
2025-12-04T10:35:21.2212480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%]
2025-12-04T10:35:21.2213515Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%]
2025-12-04T10:35:21.2214430Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 91%]
2025-12-04T10:35:21.2215284Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 92%]
2025-12-04T10:35:21.2216142Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 93%]
2025-12-04T10:35:21.2217029Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 94%]
2025-12-04T10:35:21.2217850Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%]
2025-12-04T10:35:21.2218744Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%]
2025-12-04T10:35:21.2219639Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 96%]
2025-12-04T10:35:21.2220454Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 97%]
2025-12-04T10:35:21.2221264Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 98%]
2025-12-04T10:35:21.2221864Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 99%]
2025-12-04T10:35:21.2222620Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [100%]
2025-12-04T10:35:21.2222638Z 
2025-12-04T10:35:21.2223356Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml -
2025-12-04T10:35:21.2223600Z ================ 1 passed, 121 skipped, 66 deselected in 1.73s =================
2025-12-04T10:35:21.2239339Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda']
2025-12-04T10:35:21.2239440Z 
2025-12-04T10:35:21.2239827Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_440b1865b73f9802_.log)
2025-12-04T10:35:21.2239834Z 
2025-12-04T10:35:21.2240087Z Finished inductor/test_fp8 1/1 ... [2025-12-04 10:35:19.644358][4991.653573178], took 19.80min
2025-12-04T10:35:21.2240697Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml
2025-12-04T10:35:21.2241348Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml
2025-12-04T10:35:21.2241989Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml
2025-12-04T10:35:21.2242582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml
2025-12-04T10:35:21.2243182Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml
2025-12-04T10:35:21.2243778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml
2025-12-04T10:35:21.2244365Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml
2025-12-04T10:35:21.2244973Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml
2025-12-04T10:35:21.2245570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml
2025-12-04T10:35:21.2246236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml
2025-12-04T10:35:21.2246828Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml
2025-12-04T10:35:21.2247464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml
2025-12-04T10:35:21.2248075Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml
2025-12-04T10:35:21.2248675Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml
2025-12-04T10:35:21.2249274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml
2025-12-04T10:35:21.2249873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml
2025-12-04T10:35:21.2250472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml
2025-12-04T10:35:21.2251063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml
2025-12-04T10:35:21.2251658Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml
2025-12-04T10:35:21.2252269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml
2025-12-04T10:35:21.2252866Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml
2025-12-04T10:35:21.2253462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml
2025-12-04T10:35:21.2254052Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml
2025-12-04T10:35:21.2254640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml
2025-12-04T10:35:21.2255275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml
2025-12-04T10:35:21.2255938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml
2025-12-04T10:35:21.2256573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml
2025-12-04T10:35:21.2257159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml
2025-12-04T10:35:21.2257761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml
2025-12-04T10:35:21.2258353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml
2025-12-04T10:35:21.2258952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml
2025-12-04T10:35:21.2259613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml
2025-12-04T10:35:21.2260246Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml
2025-12-04T10:35:21.2260881Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml
2025-12-04T10:35:21.2261514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml
2025-12-04T10:35:21.2262113Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml
2025-12-04T10:35:21.2262712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml
2025-12-04T10:35:21.2263308Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml
2025-12-04T10:35:21.2263910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml
2025-12-04T10:35:21.2264504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml
2025-12-04T10:35:21.2265111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml
2025-12-04T10:35:21.2265704Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml
2025-12-04T10:35:21.2266343Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml
2025-12-04T10:35:21.2266947Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml
2025-12-04T10:35:21.2267536Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml
2025-12-04T10:35:21.2268144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml
2025-12-04T10:35:21.2268750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml
2025-12-04T10:35:21.2269399Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml
2025-12-04T10:35:21.2270037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml
2025-12-04T10:35:21.2270643Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml
2025-12-04T10:35:21.2514198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml
2025-12-04T10:35:21.2782218Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml
2025-12-04T10:35:21.3079188Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml
2025-12-04T10:35:21.3372614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml
2025-12-04T10:35:21.3660864Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml
2025-12-04T10:35:21.4020613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml
2025-12-04T10:35:21.4368048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml
2025-12-04T10:35:21.4683200Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml
2025-12-04T10:35:21.4974175Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml
2025-12-04T10:35:21.5253660Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml
2025-12-04T10:35:21.5566514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml
2025-12-04T10:35:21.5897280Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml
2025-12-04T10:35:21.6185525Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml
2025-12-04T10:35:21.6509267Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml
2025-12-04T10:35:21.6756242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml
2025-12-04T10:35:21.7073485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml
2025-12-04T10:35:21.7361048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml
2025-12-04T10:35:21.7651732Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml
2025-12-04T10:35:21.7955041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml
2025-12-04T10:35:21.8250363Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml
2025-12-04T10:35:21.8549607Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml
2025-12-04T10:35:21.8867338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml
2025-12-04T10:35:21.9125174Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml
2025-12-04T10:35:21.9413791Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml
2025-12-04T10:35:21.9729077Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml
2025-12-04T10:35:22.0066345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml
2025-12-04T10:35:22.0328554Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml
2025-12-04T10:35:22.0631932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml
2025-12-04T10:35:22.1009812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml
2025-12-04T10:35:22.1315479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml
2025-12-04T10:35:22.1629926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml
2025-12-04T10:35:22.1947532Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml
2025-12-04T10:35:22.2210543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml
2025-12-04T10:35:22.2468201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml
2025-12-04T10:35:22.2743059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml
2025-12-04T10:35:22.3179229Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml
2025-12-04T10:35:22.3478372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml
2025-12-04T10:35:22.3782863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml
2025-12-04T10:35:22.4073446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml
2025-12-04T10:35:22.4400893Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml
2025-12-04T10:35:22.4830719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml
2025-12-04T10:35:22.5151191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml
2025-12-04T10:35:22.5453926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml
2025-12-04T10:35:22.5813768Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml
2025-12-04T10:35:22.6092699Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml
2025-12-04T10:35:22.6639125Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml
2025-12-04T10:35:22.6918235Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml
2025-12-04T10:35:22.7194421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml
2025-12-04T10:35:22.7448662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml
2025-12-04T10:35:22.7742208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml
2025-12-04T10:35:22.8032296Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml
2025-12-04T10:35:22.8821508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml
2025-12-04T10:35:22.9128820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml
2025-12-04T10:35:22.9422120Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml
2025-12-04T10:35:22.9725923Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml
2025-12-04T10:35:23.0010633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml
2025-12-04T10:35:23.3521061Z Uploading logs for 57118183212 to S3
2025-12-04T10:35:23.5029425Z Uploading artifacts took 0.47 seconds
2025-12-04T10:35:23.5029849Z inductor/test_fp8 1/1 failed!
2025-12-04T10:35:23.5033372Z Running dynamo/test_model_output 1/1 ... [2025-12-04 10:35:23.502972][4995.512194486]
2025-12-04T10:35:23.5034020Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:35:23.5038188Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_model_output.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:23.503420]
2025-12-04T10:35:27.5262471Z 
2025-12-04T10:35:27.5264533Z dynamo/test_model_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_model_output_1.1_2df9271f2ebae91b_.log
2025-12-04T10:35:27.5272579Z Running 18 items in this shard: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained, test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_assign, test/dynamo/test_model_output.py::TestModelOutput::test_mo_create, test/dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getitem, test/dynamo/test_model_output.py::TestModelOutput::test_mo_index, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init2, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable, test/dynamo/test_model_output.py::TestModelOutput::test_mo_newkey, test/dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode, test/dynamo/test_model_output.py::TestModelOutput::test_mo_tuple, test/dynamo/test_model_output.py::TestModelOutput::test_none, test/dynamo/test_model_output.py::TestModelOutput::test_reconstruction, test/dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda
2025-12-04T10:35:27.5278391Z 
2025-12-04T10:35:27.5278685Z Finished dynamo/test_model_output 1/1 ... [2025-12-04 10:35:27.525811][4999.535035943], took 0.07min
2025-12-04T10:35:27.5412289Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-fcf8b9b0a2e7a178.xml
2025-12-04T10:35:27.5733722Z Running inductor/test_triton_kernels 1/1 ... [2025-12-04 10:35:27.572980][4999.582204761]
2025-12-04T10:35:27.5734381Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:35:27.5737498Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:27.573331]
2025-12-04T10:38:04.9944647Z 
2025-12-04T10:38:04.9946990Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_4c43492168172809_.log
2025-12-04T10:38:05.0127164Z Running 366 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_mm_kernels_do_not_change, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_unaffected, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_to_cpu, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op
2025-12-04T10:38:05.0305217Z 
2025-12-04T10:38:05.0305533Z Finished inductor/test_triton_kernels 1/1 ... [2025-12-04 10:38:04.994456][5157.003678354], took 2.62min
2025-12-04T10:38:05.0306775Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-cc2491bbd877af9c.xml
2025-12-04T10:38:05.1080369Z Running inductor/test_loop_ordering 1/1 ... [2025-12-04 10:38:05.107619][5157.116841139]
2025-12-04T10:38:05.1080898Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:38:05.1083460Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_loop_ordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:05.107956]
2025-12-04T10:38:42.0923394Z 
2025-12-04T10:38:42.0926456Z inductor/test_loop_ordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_loop_ordering_1.1_cda1b68c4235c80b_.log
2025-12-04T10:38:42.0946065Z Running 53 items in this shard: test/inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_view, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True, test/inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise, test/inductor/test_loop_ordering.py::TestTiling::test_cat, test/inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var, test/inductor/test_loop_ordering.py::TestTiling::test_mutation_deps, test/inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction, test/inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases, test/inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression
2025-12-04T10:38:42.0964871Z 
2025-12-04T10:38:42.0965226Z Finished inductor/test_loop_ordering 1/1 ... [2025-12-04 10:38:42.092100][5194.101324917], took 0.62min
2025-12-04T10:38:42.1080258Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-66246eed1b64fd5c.xml
2025-12-04T10:38:42.1914710Z Running export/test_serdes 1/1 ... [2025-12-04 10:38:42.191046][5194.200268439]
2025-12-04T10:38:42.1915166Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:38:42.1917705Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:42.191392]
2025-12-04T10:41:57.5428707Z 
2025-12-04T10:41:57.5429656Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_c37c9c83d5d3a964_.log
2025-12-04T10:41:57.5850127Z Running 880 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_annotate_on_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cdist_forward_compute_mode_zero_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_enum_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_leak_compile_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_rnn_variants_with_warning_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_flex_attention_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inductor_backend_inside_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_161902_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_namedtuple_input_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_native_multi_attention_head_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_fake_tensor_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_annotation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_context_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tag_ac_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tril_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_triu_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_custom_autograd_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_to_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_where_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_annotate_on_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_enum_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_leak_compile_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_rnn_variants_with_warning_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_flex_attention_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_161902_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_namedtuple_input_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_native_multi_attention_head_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_fake_tensor_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_annotation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_context_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tag_ac_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tril_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_triu_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_custom_autograd_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_to_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_where_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict
2025-12-04T10:41:57.6266320Z 
2025-12-04T10:41:57.6266603Z Finished export/test_serdes 1/1 ... [2025-12-04 10:41:57.544328][5389.553550452], took 3.26min
2025-12-04T10:41:57.6267608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serdes/export.test_serdes-38411ac3079c7061.xml
2025-12-04T10:41:57.6918610Z Running inductor/test_scatter_optimization 1/1 ... [2025-12-04 10:41:57.691388][5389.700609018]
2025-12-04T10:41:57.6919151Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:41:57.6921559Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_scatter_optimization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:41:57.691758]
2025-12-04T10:42:12.1954594Z 
2025-12-04T10:42:12.1955592Z inductor/test_scatter_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_scatter_optimization_1.1_38363d3a7ae9f86e_.log
2025-12-04T10:42:12.1959420Z Running 8 items in this shard: test/inductor/test_scatter_optimization.py::TestScatterOpt::test_3d_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_dense, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_non_const, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_cross_entropy_loss, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_neg_scatter_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_non_last_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_nonzero_const_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_shorter_index_tensor
2025-12-04T10:42:12.1962594Z 
2025-12-04T10:42:12.1962942Z Finished inductor/test_scatter_optimization 1/1 ... [2025-12-04 10:42:12.195133][5404.204356776], took 0.24min
2025-12-04T10:42:12.2113623Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-ca7327bb8f17c961.xml
2025-12-04T10:42:12.2903300Z Running inductor/test_padding 1/1 ... [2025-12-04 10:42:12.289935][5404.299157682]
2025-12-04T10:42:12.2903779Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:42:12.2906391Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:42:12.290272]
2025-12-04T10:42:48.6777139Z 
2025-12-04T10:42:48.6778029Z inductor/test_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_padding_1.1_3b58a6813a3709bc_.log
2025-12-04T10:42:48.6804229Z Running 55 items in this shard: test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_BertForMaskedLM, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_nobias_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer_small_bs, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_nvidia_deeprecommender, test/inductor/test_padding.py::PaddingTest::test_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_attention, test/inductor/test_padding.py::PaddingTest::test_cat, test/inductor/test_padding.py::PaddingTest::test_conv, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_matmul, test/inductor/test_padding.py::PaddingTest::test_mm_padding_perf, test/inductor/test_padding.py::PaddingTest::test_nobias_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape0_alignment_bytes_32_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape1_alignment_bytes_32_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape2_alignment_bytes_64_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape3_alignment_bytes_64_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_pad_3d_tensor, test/inductor/test_padding.py::PaddingTest::test_pad_channels_last, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_strides, test/inductor/test_padding.py::PaddingTest::test_pad_strides_skip, test/inductor/test_padding.py::PaddingTest::test_padmm, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape0_perm0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape1_perm1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape2_perm2_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape3_perm3_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape4_perm4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape5_perm5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape6_perm6_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape7_perm7_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_view
2025-12-04T10:42:48.6828166Z 
2025-12-04T10:42:48.6828587Z Finished inductor/test_padding 1/1 ... [2025-12-04 10:42:48.677424][5440.686646006], took 0.61min
2025-12-04T10:42:48.6945605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-b7f63c3b423acf1d.xml
2025-12-04T10:42:48.7802374Z Running dynamo/test_callback 1/1 ... [2025-12-04 10:42:48.779855][5440.789078719]
2025-12-04T10:42:48.7802940Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:42:48.7806108Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_callback.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:42:48.780209]
2025-12-04T10:43:02.2230395Z 
2025-12-04T10:43:02.2231299Z dynamo/test_callback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_callback_1.1_4647abf0637b193b_.log
2025-12-04T10:43:02.2233327Z Running 4 items in this shard: test/dynamo/test_callback.py::CallbackTests::test_callbacks_with_duplicate_prevention, test/dynamo/test_callback.py::CallbackTests::test_counter, test/dynamo/test_callback.py::CallbackTests::test_counter_assertion, test/dynamo/test_callback.py::CallbackTests::test_triggers
2025-12-04T10:43:02.2234681Z 
2025-12-04T10:43:02.2234973Z Finished dynamo/test_callback 1/1 ... [2025-12-04 10:43:02.222596][5454.231819815], took 0.22min
2025-12-04T10:43:02.2406731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_callback/dynamo.test_callback-6c0ee54264bcedf0.xml
2025-12-04T10:43:02.3237328Z Running inductor/test_custom_op_autotune 1/1 ... [2025-12-04 10:43:02.323317][5454.332539314]
2025-12-04T10:43:02.3237966Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:43:02.3240812Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_custom_op_autotune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:43:02.323688]
2025-12-04T10:43:22.7815959Z 
2025-12-04T10:43:22.7817470Z inductor/test_custom_op_autotune 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_custom_op_autotune_1.1_2272505dccfac9af_.log
2025-12-04T10:43:22.7821095Z Running 3 items in this shard: test/inductor/test_custom_op_autotune.py::TestCustomOpAutoTune::test_decompose_k_custom_op_autotune_dynamic_config_for_input_shape, test/inductor/test_custom_op_autotune.py::TestCustomOpAutoTune::test_multi_parameter_tuning, test/inductor/test_custom_op_autotune.py::TestCustomOpAutoTune::test_rmsnorm_custom_op_autotune_with_dynamic_shape
2025-12-04T10:43:22.7823858Z 
2025-12-04T10:43:22.7824451Z Finished inductor/test_custom_op_autotune 1/1 ... [2025-12-04 10:43:22.781254][5474.790478088], took 0.34min
2025-12-04T10:43:22.7984655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_custom_op_autotune/inductor.test_custom_op_autotune-8f7d8d00cc13374f.xml
2025-12-04T10:43:22.8901374Z Running test_cuda 1/1 ... [2025-12-04 10:43:22.889718][5474.898940519]
2025-12-04T10:43:22.8902025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:43:22.8904727Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:43:22.890069]
2025-12-04T13:44:40.0741884Z 
2025-12-04T13:44:40.0742766Z PRINTING LOG FILE of test_cuda 1/1 (test/test-reports/test_cuda_1.1_5ed6ed395e86485d_.log)
2025-12-04T13:44:40.0743801Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-f963d2e44bab839f.xml
2025-12-04T13:44:40.0744593Z ============================= test session starts ==============================
2025-12-04T13:44:40.0745864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:44:40.0746602Z cachedir: .pytest_cache
2025-12-04T13:44:40.0747512Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:44:40.0748677Z rootdir: /var/lib/jenkins/workspace
2025-12-04T13:44:40.0749140Z configfile: pytest.ini
2025-12-04T13:44:40.0750017Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:44:40.0751047Z collecting ... collected 252 items
2025-12-04T13:44:40.0751537Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T13:44:40.0855127Z Running 252 items in this shard: test/test_cuda.py::TestCuda::test_arithmetic_large_tensor, test/test_cuda.py::TestCuda::test_batch_norm_gather_stats, test/test_cuda.py::TestCuda::test_bincount_ext, test/test_cuda.py::TestCuda::test_caching_allocator_record_stream_oom, test/test_cuda.py::TestCuda::test_caching_pinned_memory, test/test_cuda.py::TestCuda::test_check_error, test/test_cuda.py::TestCuda::test_copy_non_blocking, test/test_cuda.py::TestCuda::test_copy_non_blocking_type_conversion, test/test_cuda.py::TestCuda::test_cublas_allow_bf16_reduced_precision_reduction_get_set, test/test_cuda.py::TestCuda::test_cublas_allow_fp16_accumulation_get_set, test/test_cuda.py::TestCuda::test_cublas_allow_fp16_reduced_precision_reduction_get_set, test/test_cuda.py::TestCuda::test_cublas_allow_tf32_get_set, test/test_cuda.py::TestCuda::test_cublas_multiple_threads_same_device, test/test_cuda.py::TestCuda::test_cublas_workspace_explicit_allocation, test/test_cuda.py::TestCuda::test_cuda_get_device_capability, test/test_cuda.py::TestCuda::test_cuda_get_device_name, test/test_cuda.py::TestCuda::test_cuda_get_device_properties, test/test_cuda.py::TestCuda::test_cuda_graph_allocator_propagates_stream, test/test_cuda.py::TestCuda::test_cuda_graph_error_options, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_False, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_True, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_keep_graph_false, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_reset_and_recapture, test/test_cuda.py::TestCuda::test_cuda_graph_tensor_item_not_allowed, test/test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow, test/test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow_large, test/test_cuda.py::TestCuda::test_cuda_memory_leak_detection_propagates_errors, test/test_cuda.py::TestCuda::test_cuda_stream_protocol, test/test_cuda.py::TestCuda::test_cudart_register, test/test_cuda.py::TestCuda::test_cudnn_allow_tf32_get_set, test/test_cuda.py::TestCuda::test_cudnn_multiple_threads_same_device, test/test_cuda.py::TestCuda::test_cusparse_multiple_threads_same_device, test/test_cuda.py::TestCuda::test_device_context_manager, test/test_cuda.py::TestCuda::test_device_count_not_cached_pre_init, test/test_cuda.py::TestCuda::test_events, test/test_cuda.py::TestCuda::test_events_elapsedtime, test/test_cuda.py::TestCuda::test_fixed_cuda_assert_async, test/test_cuda.py::TestCuda::test_float32_matmul_precision_get_set, test/test_cuda.py::TestCuda::test_fp32_precision_with_float32_matmul_precision, test/test_cuda.py::TestCuda::test_fp32_precision_with_tf32, test/test_cuda.py::TestCuda::test_gather_bool, test/test_cuda.py::TestCuda::test_gds_fails_in_ci, test/test_cuda.py::TestCuda::test_generic_stream_event, test/test_cuda.py::TestCuda::test_get_device_index, test/test_cuda.py::TestCuda::test_get_per_process_memory_fraction, test/test_cuda.py::TestCuda::test_graph_capture_oom, test/test_cuda.py::TestCuda::test_graph_capture_reset_recapture, test/test_cuda.py::TestCuda::test_graph_capture_simple, test/test_cuda.py::TestCuda::test_graph_checkpoint_preserve_rng_state, test/test_cuda.py::TestCuda::test_graph_concurrent_replay, test/test_cuda.py::TestCuda::test_graph_cudnn_dropout, test/test_cuda.py::TestCuda::test_graph_debugdump, test/test_cuda.py::TestCuda::test_graph_error, test/test_cuda.py::TestCuda::test_graph_is_current_stream_capturing, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_disabled_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_enabled_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_not_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_same_pool, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_with_amp_cache_enabled_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_without_amp_not_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_manual_seed_mismatch_raises, test/test_cuda.py::TestCuda::test_graph_memory_stats_and_use_result_after_destroy_graph, test/test_cuda.py::TestCuda::test_graph_optims_with_explicitly_capturable_param_groups, test/test_cuda.py::TestCuda::test_graph_record_stream, test/test_cuda.py::TestCuda::test_graph_rng_distributions, test/test_cuda.py::TestCuda::test_graph_rng_functional, test/test_cuda.py::TestCuda::test_graph_three_successive, test/test_cuda.py::TestCuda::test_graph_timing, test/test_cuda.py::TestCuda::test_graph_two_successive, test/test_cuda.py::TestCuda::test_graph_warn_if_has_zero_nodes, test/test_cuda.py::TestCuda::test_graphsafe_set_get_rng_state, test/test_cuda.py::TestCuda::test_hip_device_count, test/test_cuda.py::TestCuda::test_host_memory_stats, test/test_cuda.py::TestCuda::test_huge_index, test/test_cuda.py::TestCuda::test_index_out_of_bounds_exception_cuda, test/test_cuda.py::TestCuda::test_invalid_status_for_legacy_api, test/test_cuda.py::TestCuda::test_is_pinned_no_context, test/test_cuda.py::TestCuda::test_lazy_init, test/test_cuda.py::TestCuda::test_manual_seed, test/test_cuda.py::TestCuda::test_matmul_device_mismatch, test/test_cuda.py::TestCuda::test_matmul_memory_use, test/test_cuda.py::TestCuda::test_max_large_axis, test/test_cuda.py::TestCuda::test_mean_fp16, test/test_cuda.py::TestCuda::test_memory_allocation, test/test_cuda.py::TestCuda::test_memory_stats, test/test_cuda.py::TestCuda::test_memory_stats_of_multiple_generators_and_graphs, test/test_cuda.py::TestCuda::test_min_max_inits, test/test_cuda.py::TestCuda::test_multi_device_context_manager, test/test_cuda.py::TestCuda::test_multi_device_stream_context_manager, test/test_cuda.py::TestCuda::test_multinomial_ext, test/test_cuda.py::TestCuda::test_multinomial_invalid_probs_cuda, test/test_cuda.py::TestCuda::test_noncontiguous_pinned_memory, test/test_cuda.py::TestCuda::test_norm_type_conversion, test/test_cuda.py::TestCuda::test_nvtx, test/test_cuda.py::TestCuda::test_out_of_memory, test/test_cuda.py::TestCuda::test_out_of_memory_retry, test/test_cuda.py::TestCuda::test_pinned_memory_empty_cache, test/test_cuda.py::TestCuda::test_pinned_memory_use_background_threads, test/test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister, test/test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister_multithread, test/test_cuda.py::TestCuda::test_preferred_blas_library_settings, test/test_cuda.py::TestCuda::test_prod_large, test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel, test/test_cuda.py::TestCuda::test_randint_randomness_for_large_range, test/test_cuda.py::TestCuda::test_random_no_reused_random_states_float32, test/test_cuda.py::TestCuda::test_random_no_reused_random_states_float64, test/test_cuda.py::TestCuda::test_record_stream, test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view, test/test_cuda.py::TestCuda::test_reduction_gpu_memory_accessing, test/test_cuda.py::TestCuda::test_repeat_graph_capture_cublas_workspace_memory, test/test_cuda.py::TestCuda::test_rocm_backward_pass_guard, test/test_cuda.py::TestCuda::test_serialization_array_with_empty, test/test_cuda.py::TestCuda::test_serialization_array_with_storage, test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction, test/test_cuda.py::TestCuda::test_specify_improper_device_name, test/test_cuda.py::TestCuda::test_stream_compatibility, test/test_cuda.py::TestCuda::test_stream_context_manager, test/test_cuda.py::TestCuda::test_stream_event_repr, test/test_cuda.py::TestCuda::test_streaming_backwards_callback, test/test_cuda.py::TestCuda::test_streaming_backwards_multiple_streams, test/test_cuda.py::TestCuda::test_streaming_backwards_sync, test/test_cuda.py::TestCuda::test_streaming_backwards_sync_graph_root, test/test_cuda.py::TestCuda::test_streams, test/test_cuda.py::TestCuda::test_sum_fp16, test/test_cuda.py::TestCuda::test_tiny_half_norm_, test/test_cuda.py::TestCuda::test_to_cpu_blocking_by_default, test/test_cuda.py::TestCuda::test_to_non_blocking, test/test_cuda.py::TestCuda::test_to_numpy, test/test_cuda.py::TestCuda::test_torch_manual_seed_seeds_cuda_devices, test/test_cuda.py::TestCuda::test_type_conversions, test/test_cuda.py::TestCuda::test_uuid, test/test_cuda.py::TestCudaMallocAsync::test_allocator_backend, test/test_cuda.py::TestCudaMallocAsync::test_allocator_fuzz, test/test_cuda.py::TestCudaMallocAsync::test_allocator_memory_fraction_setting, test/test_cuda.py::TestCudaMallocAsync::test_allocator_settings, test/test_cuda.py::TestCudaMallocAsync::test_cachingAllocator_raw_alloc, test/test_cuda.py::TestCudaMallocAsync::test_clock_speed, test/test_cuda.py::TestCudaMallocAsync::test_cpp_memory_snapshot_pickle, test/test_cuda.py::TestCudaMallocAsync::test_cycles, test/test_cuda.py::TestCudaMallocAsync::test_device_memory_used, test/test_cuda.py::TestCudaMallocAsync::test_direct_traceback, test/test_cuda.py::TestCudaMallocAsync::test_garbage_collect_expandable, test/test_cuda.py::TestCudaMallocAsync::test_max_split_expandable, test/test_cuda.py::TestCudaMallocAsync::test_memory_compile_regions, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_segment_stack, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_stack, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_history_context, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_metadata, test/test_cuda.py::TestCudaMallocAsync::test_memory_profiler_viz, test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot, test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_script, test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_with_cpp, test/test_cuda.py::TestCudaMallocAsync::test_notifies_oom, test/test_cuda.py::TestCudaMallocAsync::test_nvml_get_handler, test/test_cuda.py::TestCudaMallocAsync::test_power_draw, test/test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_False, test/test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_True, test/test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_count, test/test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_uuids, test/test_cuda.py::TestCudaMallocAsync::test_temperature, test/test_cuda.py::TestCudaMallocAsync::test_uuid_visible_devices, test/test_cuda.py::TestBlockStateAbsorption::test_additional_free_following_checkpoint, test/test_cuda.py::TestBlockStateAbsorption::test_allocate_in_thread_to_pool, test/test_cuda.py::TestBlockStateAbsorption::test_allocated_in_middle_of_segment, test/test_cuda.py::TestBlockStateAbsorption::test_assigning_back_deleter_fns_to_tensor, test/test_cuda.py::TestBlockStateAbsorption::test_check_pool_live_allocations, test/test_cuda.py::TestBlockStateAbsorption::test_middle_allocations_contiguous, test/test_cuda.py::TestBlockStateAbsorption::test_multiple_middle_allocations, test/test_cuda.py::TestBlockStateAbsorption::test_no_triton_on_import, test/test_cuda.py::TestBlockStateAbsorption::test_resnet, test/test_cuda.py::TestBlockStateAbsorption::test_simple, test/test_cuda.py::TestBlockStateAbsorption::test_tensor_dies_after_checkpoint, test/test_cuda.py::TestMemPool::test_graph_capture_reclaim_2_streams, test/test_cuda.py::TestMemPool::test_graph_capture_reclaim_4_streams, test/test_cuda.py::TestMemPool::test_mempool_ctx_multithread, test/test_cuda.py::TestMemPool::test_mempool_empty_cache, test/test_cuda.py::TestMemPool::test_mempool_empty_cache_inactive, test/test_cuda.py::TestMemPool::test_mempool_emptycache_multithread, test/test_cuda.py::TestMemPool::test_mempool_expandable, test/test_cuda.py::TestMemPool::test_mempool_id, test/test_cuda.py::TestMemPool::test_mempool_limited_memory_with_allocator, test/test_cuda.py::TestMemPool::test_mempool_multithread, test/test_cuda.py::TestMemPool::test_mempool_with_allocator, test/test_cuda.py::TestMemPool::test_nested_mempool, test/test_cuda.py::TestGDS::test_gds_read_write_tensors, test/test_cuda.py::TestCudaAutocast::test_autocast_banned, test/test_cuda.py::TestCudaAutocast::test_autocast_cache_leak, test/test_cuda.py::TestCudaAutocast::test_autocast_cat_jit, test/test_cuda.py::TestCudaAutocast::test_autocast_checkpointing, test/test_cuda.py::TestCudaAutocast::test_autocast_custom_cast_inputs, test/test_cuda.py::TestCudaAutocast::test_autocast_custom_deprecated_warning, test/test_cuda.py::TestCudaAutocast::test_autocast_custom_enabled, test/test_cuda.py::TestCudaAutocast::test_autocast_ignored_types, test/test_cuda.py::TestCudaAutocast::test_autocast_linalg_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_methods_expect_builtin_promote, test/test_cuda.py::TestCudaAutocast::test_autocast_methods_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_methods_fp32, test/test_cuda.py::TestCudaAutocast::test_autocast_nn_bf16, test/test_cuda.py::TestCudaAutocast::test_autocast_nn_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_nn_fp32, test/test_cuda.py::TestCudaAutocast::test_autocast_rnn, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_bf16, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_expect_builtin_promote, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_fp32, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_need_autocast_promote, test/test_cuda.py::TestCudaAutocast::test_cuda_autocast_deprecated_warning, test/test_cuda.py::TestCompileKernel::test_compile_kernel, test/test_cuda.py::TestCompileKernel::test_compile_kernel_advanced, test/test_cuda.py::TestCompileKernel::test_compile_kernel_as_custom_op, test/test_cuda.py::TestCompileKernel::test_compile_kernel_cuda_headers, test/test_cuda.py::TestCompileKernel::test_compile_kernel_custom_op_validation, test/test_cuda.py::TestCompileKernel::test_compile_kernel_dlpack, test/test_cuda.py::TestCompileKernel::test_compile_kernel_double_precision, test/test_cuda.py::TestCompileKernel::test_compile_kernel_large_shared_memory, test/test_cuda.py::TestCompileKernel::test_compile_kernel_template, test/test_cuda.py::TestFXMemoryProfiler::test_fx_memory_profiler_augmentation, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_ASGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adadelta_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adamax_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_NAdam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RAdam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RMSprop_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Rprop_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_SGD_cuda_float32, test/test_cuda.py::TestCudaDeviceParametrizedCUDA::test_graph_external_wait_and_record_cuda
2025-12-04T13:44:40.0957469Z 
2025-12-04T13:44:40.0958265Z test_cuda.py::TestCuda::test_arithmetic_large_tensor SKIPPED [0.0003s] (was disabled due to not enough memory, but actually it always fail) [  0%]
2025-12-04T13:44:40.0959503Z test_cuda.py::TestCuda::test_batch_norm_gather_stats PASSED [0.0946s]    [  0%]
2025-12-04T13:44:40.0960403Z test_cuda.py::TestCuda::test_bincount_ext PASSED [0.0414s]               [  1%]
2025-12-04T13:44:40.0961226Z test_cuda.py::TestCuda::test_caching_allocator_record_stream_oom PASSED [0.2270s] [  1%]
2025-12-04T13:44:40.0962103Z test_cuda.py::TestCuda::test_caching_pinned_memory PASSED [0.9978s]      [  1%]
2025-12-04T13:44:40.0963129Z test_cuda.py::TestCuda::test_check_error PASSED [0.0015s]                [  2%]
2025-12-04T13:44:40.0963971Z test_cuda.py::TestCuda::test_copy_non_blocking PASSED [0.0544s]          [  2%]
2025-12-04T13:44:40.0964986Z test_cuda.py::TestCuda::test_copy_non_blocking_type_conversion PASSED [0.1007s] [  3%]
2025-12-04T13:44:40.0966134Z test_cuda.py::TestCuda::test_cublas_allow_bf16_reduced_precision_reduction_get_set PASSED [0.0019s] [  3%]
2025-12-04T13:44:40.0967254Z test_cuda.py::TestCuda::test_cublas_allow_fp16_accumulation_get_set PASSED [0.0019s] [  3%]
2025-12-04T13:44:40.0968330Z test_cuda.py::TestCuda::test_cublas_allow_fp16_reduced_precision_reduction_get_set PASSED [0.0014s] [  4%]
2025-12-04T13:44:40.0969451Z test_cuda.py::TestCuda::test_cublas_allow_tf32_get_set PASSED [0.0013s]  [  4%]
2025-12-04T13:44:40.0970441Z test_cuda.py::TestCuda::test_cublas_multiple_threads_same_device PASSED [0.1636s] [  5%]
2025-12-04T13:44:40.0971515Z test_cuda.py::TestCuda::test_cublas_workspace_explicit_allocation PASSED [0.0045s] [  5%]
2025-12-04T13:44:40.0972533Z test_cuda.py::TestCuda::test_cuda_get_device_capability PASSED [0.0015s] [  5%]
2025-12-04T13:44:40.0973475Z test_cuda.py::TestCuda::test_cuda_get_device_name PASSED [0.0015s]       [  6%]
2025-12-04T13:44:40.0974379Z test_cuda.py::TestCuda::test_cuda_get_device_properties PASSED [0.0015s] [  6%]
2025-12-04T13:44:40.0975500Z test_cuda.py::TestCuda::test_cuda_graph_allocator_propagates_stream PASSED [0.0035s] [  7%]
2025-12-04T13:44:40.0976530Z test_cuda.py::TestCuda::test_cuda_graph_error_options PASSED [0.0180s]   [  7%]
2025-12-04T13:44:40.0977453Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph PASSED [0.0468s]       [  7%]
2025-12-04T13:44:40.0978570Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_False PASSED [0.0025s] [  8%]
2025-12-04T13:44:40.0979760Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_True PASSED [0.0025s] [  8%]
2025-12-04T13:44:40.0980866Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_keep_graph_false PASSED [0.0027s] [  9%]
2025-12-04T13:44:40.0981951Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_reset_and_recapture PASSED [0.0027s] [  9%]
2025-12-04T13:44:40.0983096Z test_cuda.py::TestCuda::test_cuda_graph_tensor_item_not_allowed Traceback (most recent call last):
2025-12-04T13:44:40.0983949Z   File "<string>", line 17, in <module>
2025-12-04T13:44:40.0984445Z   File "<string>", line 7, in my_func
2025-12-04T13:44:40.0985225Z torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing
2025-12-04T13:44:40.0986788Z Search for `cudaErrorStreamCaptureUnsupported' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T13:44:40.0988490Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T13:44:40.0989558Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T13:44:40.0990324Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T13:44:40.0990819Z 
2025-12-04T13:44:40.0990826Z 
2025-12-04T13:44:40.0991155Z During handling of the above exception, another exception occurred:
2025-12-04T13:44:40.0991666Z 
2025-12-04T13:44:40.0991836Z Traceback (most recent call last):
2025-12-04T13:44:40.0992300Z   File "<string>", line 16, in <module>
2025-12-04T13:44:40.0993192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/graphs.py", line 267, in __exit__
2025-12-04T13:44:40.0994090Z     self.cuda_graph.capture_end()
2025-12-04T13:44:40.0994982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/graphs.py", line 129, in capture_end
2025-12-04T13:44:40.0995952Z     super().capture_end()
2025-12-04T13:44:40.0996696Z torch.AcceleratorError: CUDA error: operation failed due to a previous error during capture
2025-12-04T13:44:40.0998138Z Search for `cudaErrorStreamCaptureInvalidated' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T13:44:40.0999839Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T13:44:40.1000880Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T13:44:40.1001558Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T13:44:40.1002020Z 
2025-12-04T13:44:40.1002153Z PASSED [1.8930s] [  9%]
2025-12-04T13:44:40.1002723Z test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow PASSED [0.0334s]  [ 10%]
2025-12-04T13:44:40.1003587Z test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow_large PASSED [0.0771s] [ 10%]
2025-12-04T13:44:40.1004542Z test_cuda.py::TestCuda::test_cuda_memory_leak_detection_propagates_errors PASSED [0.0021s] [ 11%]
2025-12-04T13:44:40.1005528Z test_cuda.py::TestCuda::test_cuda_stream_protocol PASSED [0.0015s]       [ 11%]
2025-12-04T13:44:40.1006354Z test_cuda.py::TestCuda::test_cudart_register PASSED [0.0019s]            [ 11%]
2025-12-04T13:44:40.1007168Z test_cuda.py::TestCuda::test_cudnn_allow_tf32_get_set PASSED [0.0014s]   [ 12%]
2025-12-04T13:44:40.1008235Z test_cuda.py::TestCuda::test_cudnn_multiple_threads_same_device PASSED [2.7467s] [ 12%]
2025-12-04T13:44:40.1009222Z test_cuda.py::TestCuda::test_cusparse_multiple_threads_same_device PASSED [36.3461s] [ 13%]
2025-12-04T13:44:40.1010171Z test_cuda.py::TestCuda::test_device_context_manager PASSED [0.0017s]     [ 13%]
2025-12-04T13:44:40.1011356Z test_cuda.py::TestCuda::test_device_count_not_cached_pre_init SKIPPED [0.0002s] (requires multiple devices) [ 13%]
2025-12-04T13:44:40.1012472Z test_cuda.py::TestCuda::test_events PASSED [0.0511s]                     [ 14%]
2025-12-04T13:44:40.1013449Z test_cuda.py::TestCuda::test_events_elapsedtime PASSED [0.0017s]         [ 14%]
2025-12-04T13:44:40.1015159Z test_cuda.py::TestCuda::test_fixed_cuda_assert_async /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed.
2025-12-04T13:44:40.1016694Z Traceback (most recent call last):
2025-12-04T13:44:40.1017144Z   File "<string>", line 4, in <module>
2025-12-04T13:44:40.1017969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize
2025-12-04T13:44:40.1018857Z     return torch._C._cuda_synchronize()
2025-12-04T13:44:40.1019559Z torch.AcceleratorError: CUDA error: device-side assert triggered
2025-12-04T13:44:40.1020707Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T13:44:40.1022186Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T13:44:40.1023264Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T13:44:40.1023943Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T13:44:40.1024454Z 
2025-12-04T13:44:40.1025399Z /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed.
2025-12-04T13:44:40.1026750Z Traceback (most recent call last):
2025-12-04T13:44:40.1027164Z   File "<string>", line 4, in <module>
2025-12-04T13:44:40.1027951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize
2025-12-04T13:44:40.1028703Z     return torch._C._cuda_synchronize()
2025-12-04T13:44:40.1029290Z torch.AcceleratorError: CUDA error: device-side assert triggered
2025-12-04T13:44:40.1030288Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T13:44:40.1031442Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T13:44:40.1032125Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T13:44:40.1032584Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T13:44:40.1034055Z 
2025-12-04T13:44:40.1034686Z /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed.
2025-12-04T13:44:40.1035656Z Traceback (most recent call last):
2025-12-04T13:44:40.1036035Z   File "<string>", line 4, in <module>
2025-12-04T13:44:40.1036621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize
2025-12-04T13:44:40.1037221Z     return torch._C._cuda_synchronize()
2025-12-04T13:44:40.1037636Z torch.AcceleratorError: CUDA error: device-side assert triggered
2025-12-04T13:44:40.1038419Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T13:44:40.1039394Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T13:44:40.1040077Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T13:44:40.1040548Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T13:44:40.1040859Z 
2025-12-04T13:44:40.1041501Z /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:113: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed.
2025-12-04T13:44:40.1042345Z Traceback (most recent call last):
2025-12-04T13:44:40.1042696Z   File "<string>", line 4, in <module>
2025-12-04T13:44:40.1043290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize
2025-12-04T13:44:40.1043887Z     return torch._C._cuda_synchronize()
2025-12-04T13:44:40.1044384Z torch.AcceleratorError: CUDA error: device-side assert triggered
2025-12-04T13:44:40.1045215Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
2025-12-04T13:44:40.1046194Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-12-04T13:44:40.1046878Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2025-12-04T13:44:40.1047346Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2025-12-04T13:44:40.1047665Z 
2025-12-04T13:44:40.1047762Z PASSED [7.5865s]    [ 15%]
2025-12-04T13:44:40.1048190Z test_cuda.py::TestCuda::test_float32_matmul_precision_get_set PASSED [0.0018s] [ 15%]
2025-12-04T13:44:40.1048882Z test_cuda.py::TestCuda::test_fp32_precision_with_float32_matmul_precision PASSED [0.0015s] [ 15%]
2025-12-04T13:44:40.1049547Z test_cuda.py::TestCuda::test_fp32_precision_with_tf32 PASSED [0.0016s]   [ 16%]
2025-12-04T13:44:40.1050121Z test_cuda.py::TestCuda::test_gather_bool PASSED [0.0133s]                [ 16%]
2025-12-04T13:44:40.1050674Z test_cuda.py::TestCuda::test_gds_fails_in_ci PASSED [0.8428s]            [ 17%]
2025-12-04T13:44:40.1051244Z test_cuda.py::TestCuda::test_generic_stream_event PASSED [0.0031s]       [ 17%]
2025-12-04T13:44:40.1051814Z test_cuda.py::TestCuda::test_get_device_index PASSED [0.0015s]           [ 17%]
2025-12-04T13:44:40.1052411Z test_cuda.py::TestCuda::test_get_per_process_memory_fraction PASSED [0.0016s] [ 18%]
2025-12-04T13:44:40.1052995Z test_cuda.py::TestCuda::test_graph_capture_oom PASSED [0.3562s]          [ 18%]
2025-12-04T13:44:40.1053593Z test_cuda.py::TestCuda::test_graph_capture_reset_recapture PASSED [0.0031s] [ 19%]
2025-12-04T13:44:40.1054179Z test_cuda.py::TestCuda::test_graph_capture_simple PASSED [0.0025s]       [ 19%]
2025-12-04T13:44:40.1054783Z test_cuda.py::TestCuda::test_graph_checkpoint_preserve_rng_state PASSED [0.0089s] [ 19%]
2025-12-04T13:44:40.1055450Z test_cuda.py::TestCuda::test_graph_concurrent_replay PASSED [0.0143s]    [ 20%]
2025-12-04T13:44:40.1056112Z test_cuda.py::TestCuda::test_graph_cudnn_dropout PASSED [0.0518s]        [ 20%]
2025-12-04T13:44:40.1056847Z test_cuda.py::TestCuda::test_graph_debugdump PASSED [0.1549s]            [ 21%]
2025-12-04T13:44:40.1057684Z test_cuda.py::TestCuda::test_graph_error PASSED [1.9977s]                [ 21%]
2025-12-04T13:44:40.1058496Z test_cuda.py::TestCuda::test_graph_is_current_stream_capturing PASSED [0.1336s] [ 21%]
2025-12-04T13:44:40.1059653Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_disabled_allow_unused_input PASSED [0.2867s] [ 22%]
2025-12-04T13:44:40.1060846Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_enabled_allow_unused_input XFAIL [0.1578s] [ 22%]
2025-12-04T13:44:40.1061976Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_allow_unused_input PASSED [0.2925s] [ 23%]
2025-12-04T13:44:40.1063090Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_not_allow_unused_input PASSED [0.2906s] [ 23%]
2025-12-04T13:44:40.1064293Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_same_pool SKIPPED [0.0004s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 23%]
2025-12-04T13:44:40.1065728Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_with_amp_cache_enabled_allow_unused_input XFAIL [0.3487s] [ 24%]
2025-12-04T13:44:40.1067012Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_without_amp_not_allow_unused_input XFAIL [0.3263s] [ 24%]
2025-12-04T13:44:40.1068267Z test_cuda.py::TestCuda::test_graph_manual_seed_mismatch_raises PASSED [0.1424s] [ 25%]
2025-12-04T13:44:40.1069389Z test_cuda.py::TestCuda::test_graph_memory_stats_and_use_result_after_destroy_graph PASSED [1.5978s] [ 25%]
2025-12-04T13:44:40.1070638Z test_cuda.py::TestCuda::test_graph_optims_with_explicitly_capturable_param_groups PASSED [0.3633s] [ 25%]
2025-12-04T13:44:40.1071782Z test_cuda.py::TestCuda::test_graph_record_stream PASSED [0.1627s]        [ 26%]
2025-12-04T13:44:40.1072681Z test_cuda.py::TestCuda::test_graph_rng_distributions PASSED [0.2154s]    [ 26%]
2025-12-04T13:44:40.1073540Z test_cuda.py::TestCuda::test_graph_rng_functional PASSED [0.1421s]       [ 26%]
2025-12-04T13:44:40.1074399Z test_cuda.py::TestCuda::test_graph_three_successive PASSED [0.1377s]     [ 27%]
2025-12-04T13:44:40.1075337Z test_cuda.py::TestCuda::test_graph_timing PASSED [0.1339s]               [ 27%]
2025-12-04T13:44:40.1076210Z test_cuda.py::TestCuda::test_graph_two_successive PASSED [0.1426s]       [ 28%]
2025-12-04T13:44:40.1077114Z test_cuda.py::TestCuda::test_graph_warn_if_has_zero_nodes PASSED [0.1333s] [ 28%]
2025-12-04T13:44:40.1078025Z test_cuda.py::TestCuda::test_graphsafe_set_get_rng_state PASSED [0.1364s] [ 28%]
2025-12-04T13:44:40.1079067Z test_cuda.py::TestCuda::test_hip_device_count SKIPPED [0.0002s] (not relevant for CUDA testing) [ 29%]
2025-12-04T13:44:40.1081889Z test_cuda.py::TestCuda::test_host_memory_stats SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/148607 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 29%]
2025-12-04T13:44:40.1084835Z test_cuda.py::TestCuda::test_huge_index SKIPPED [0.1327s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 30%]
2025-12-04T13:44:40.1086446Z test_cuda.py::TestCuda::test_index_out_of_bounds_exception_cuda SKIPPED [0.1324s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 30%]
2025-12-04T13:44:40.1089572Z test_cuda.py::TestCuda::test_invalid_status_for_legacy_api SKIPPED [0.0006s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157110 for platform(s) linux, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 30%]
2025-12-04T13:44:40.1092262Z test_cuda.py::TestCuda::test_is_pinned_no_context PASSED [2.0462s]       [ 31%]
2025-12-04T13:44:40.1093251Z test_cuda.py::TestCuda::test_lazy_init PASSED [3.4449s]                  [ 31%]
2025-12-04T13:44:40.1094121Z test_cuda.py::TestCuda::test_manual_seed PASSED [0.1355s]                [ 32%]
2025-12-04T13:44:40.1095085Z test_cuda.py::TestCuda::test_matmul_device_mismatch PASSED [0.1354s]     [ 32%]
2025-12-04T13:44:40.1096018Z test_cuda.py::TestCuda::test_matmul_memory_use PASSED [0.1385s]          [ 32%]
2025-12-04T13:44:40.1097195Z test_cuda.py::TestCuda::test_max_large_axis SKIPPED [0.1329s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 33%]
2025-12-04T13:44:40.1098359Z test_cuda.py::TestCuda::test_mean_fp16 PASSED [0.1397s]                  [ 33%]
2025-12-04T13:44:40.1099300Z test_cuda.py::TestCuda::test_memory_allocation PASSED [0.2639s]          [ 34%]
2025-12-04T13:44:40.1100193Z test_cuda.py::TestCuda::test_memory_stats PASSED [0.3966s]               [ 34%]
2025-12-04T13:44:40.1101214Z test_cuda.py::TestCuda::test_memory_stats_of_multiple_generators_and_graphs PASSED [0.5354s] [ 34%]
2025-12-04T13:44:40.1102244Z test_cuda.py::TestCuda::test_min_max_inits PASSED [0.1334s]              [ 35%]
2025-12-04T13:44:40.1103298Z test_cuda.py::TestCuda::test_multi_device_context_manager SKIPPED [0.0002s] (only one GPU detected) [ 35%]
2025-12-04T13:44:40.1104611Z test_cuda.py::TestCuda::test_multi_device_stream_context_manager SKIPPED [0.0002s] (only one GPU detected) [ 36%]
2025-12-04T13:44:40.1105882Z test_cuda.py::TestCuda::test_multinomial_ext PASSED [0.1714s]            [ 36%]
2025-12-04T13:44:40.1107176Z test_cuda.py::TestCuda::test_multinomial_invalid_probs_cuda SKIPPED [0.1337s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 36%]
2025-12-04T13:44:40.1109142Z test_cuda.py::TestCuda::test_noncontiguous_pinned_memory PASSED [0.1327s] [ 37%]
2025-12-04T13:44:40.1109747Z test_cuda.py::TestCuda::test_norm_type_conversion PASSED [0.1532s]       [ 37%]
2025-12-04T13:44:40.1110314Z test_cuda.py::TestCuda::test_nvtx PASSED [0.1326s]                       [ 38%]
2025-12-04T13:44:40.1110872Z test_cuda.py::TestCuda::test_out_of_memory PASSED [0.1340s]              [ 38%]
2025-12-04T13:44:40.1111415Z test_cuda.py::TestCuda::test_out_of_memory_retry PASSED [0.1431s]        [ 38%]
2025-12-04T13:44:40.1111979Z test_cuda.py::TestCuda::test_pinned_memory_empty_cache PASSED [0.1651s]  [ 39%]
2025-12-04T13:44:40.1112596Z test_cuda.py::TestCuda::test_pinned_memory_use_background_threads PASSED [2.0022s] [ 39%]
2025-12-04T13:44:40.1113239Z test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister PASSED [0.1719s] [ 40%]
2025-12-04T13:44:40.1113920Z test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister_multithread PASSED [0.2822s] [ 40%]
2025-12-04T13:44:40.1114600Z test_cuda.py::TestCuda::test_preferred_blas_library_settings PASSED [3.4513s] [ 40%]
2025-12-04T13:44:40.1115196Z test_cuda.py::TestCuda::test_prod_large PASSED [4.0793s]                 [ 41%]
2025-12-04T13:44:40.1115792Z test_cuda.py::TestCuda::test_randint_generation_for_large_numel PASSED [0.3513s] [ 41%]
2025-12-04T13:44:40.1116438Z test_cuda.py::TestCuda::test_randint_randomness_for_large_range PASSED [0.2230s] [ 42%]
2025-12-04T13:44:40.1117096Z test_cuda.py::TestCuda::test_random_no_reused_random_states_float32 PASSED [0.6347s] [ 42%]
2025-12-04T13:44:40.1117756Z test_cuda.py::TestCuda::test_random_no_reused_random_states_float64 PASSED [0.6922s] [ 42%]
2025-12-04T13:44:40.1118367Z test_cuda.py::TestCuda::test_record_stream PASSED [0.1862s]              [ 43%]
2025-12-04T13:44:40.1118998Z test_cuda.py::TestCuda::test_record_stream_on_shifted_view Command took >60min, returning 124
2025-12-04T13:44:40.1119487Z Got exit code 124
2025-12-04T13:44:40.1119704Z Retrying single test...
2025-12-04T13:44:40.1120197Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-4cb2e826acd2b876.xml
2025-12-04T13:44:40.1120789Z ============================= test session starts ==============================
2025-12-04T13:44:40.1121342Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:44:40.1122081Z cachedir: .pytest_cache
2025-12-04T13:44:40.1122694Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:44:40.1123430Z rootdir: /var/lib/jenkins/workspace
2025-12-04T13:44:40.1123720Z configfile: pytest.ini
2025-12-04T13:44:40.1124342Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:44:40.1125140Z collecting ... collected 252 items / 251 deselected / 1 selected
2025-12-04T13:44:40.1125887Z stepcurrent: skipping 109 already run items. Running only test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view
2025-12-04T13:44:40.1126520Z Running 1 items in this shard
2025-12-04T13:44:40.1126705Z 
2025-12-04T13:44:40.1126999Z test_cuda.py::TestCuda::test_record_stream_on_shifted_view Command took >60min, returning 124
2025-12-04T13:44:40.1127496Z Got exit code 124
2025-12-04T13:44:40.1127719Z Retrying single test...
2025-12-04T13:44:40.1128201Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-7396cb3929bf8579.xml
2025-12-04T13:44:40.1128789Z ============================= test session starts ==============================
2025-12-04T13:44:40.1129345Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:44:40.1129840Z cachedir: .pytest_cache
2025-12-04T13:44:40.1130507Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:44:40.1131183Z rootdir: /var/lib/jenkins/workspace
2025-12-04T13:44:40.1131524Z configfile: pytest.ini
2025-12-04T13:44:40.1132137Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:44:40.1132903Z collecting ... collected 252 items / 251 deselected / 1 selected
2025-12-04T13:44:40.1133628Z stepcurrent: skipping 109 already run items. Running only test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view
2025-12-04T13:44:40.1134253Z Running 1 items in this shard
2025-12-04T13:44:40.1134438Z 
2025-12-04T13:44:40.1134730Z test_cuda.py::TestCuda::test_record_stream_on_shifted_view Command took >60min, returning 124
2025-12-04T13:44:40.1135254Z Got exit code 124
2025-12-04T13:44:40.1135676Z FAILED CONSISTENTLY: test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view
2025-12-04T13:44:40.1136418Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:44:40.1137225Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml
2025-12-04T13:44:40.1137815Z ============================= test session starts ==============================
2025-12-04T13:44:40.1138359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:44:40.1138858Z cachedir: .pytest_cache
2025-12-04T13:44:40.1139585Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:44:40.1140244Z rootdir: /var/lib/jenkins/workspace
2025-12-04T13:44:40.1140533Z configfile: pytest.ini
2025-12-04T13:44:40.1141140Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:44:40.1141907Z collecting ... collected 252 items / 110 deselected / 142 selected
2025-12-04T13:44:40.1142328Z stepcurrent: skipping 110 already run items.
2025-12-04T13:44:40.1142653Z Running 142 items in this shard
2025-12-04T13:44:40.1142832Z 
2025-12-04T13:44:40.1143091Z test_cuda.py::TestCuda::test_reduction_gpu_memory_accessing PASSED [0.0280s] [  0%]
2025-12-04T13:44:40.1143780Z test_cuda.py::TestCuda::test_repeat_graph_capture_cublas_workspace_memory PASSED [0.1344s] [  1%]
2025-12-04T13:44:40.1144494Z test_cuda.py::TestCuda::test_rocm_backward_pass_guard SKIPPED [0.0003s] (ROCm-only test) [  2%]
2025-12-04T13:44:40.1145232Z test_cuda.py::TestCuda::test_serialization_array_with_empty PASSED [0.0616s] [  2%]
2025-12-04T13:44:40.1145910Z test_cuda.py::TestCuda::test_serialization_array_with_storage PASSED [0.0045s] [  3%]
2025-12-04T13:44:40.1146532Z test_cuda.py::TestCuda::test_set_per_process_memory_fraction PASSED [0.0139s] [  4%]
2025-12-04T13:44:40.1147141Z test_cuda.py::TestCuda::test_specify_improper_device_name PASSED [0.0024s] [  4%]
2025-12-04T13:44:40.1147723Z test_cuda.py::TestCuda::test_stream_compatibility PASSED [0.0022s]       [  5%]
2025-12-04T13:44:40.1148295Z test_cuda.py::TestCuda::test_stream_context_manager PASSED [0.0015s]     [  6%]
2025-12-04T13:44:40.1148854Z test_cuda.py::TestCuda::test_stream_event_repr PASSED [0.0013s]          [  7%]
2025-12-04T13:44:40.1149433Z test_cuda.py::TestCuda::test_streaming_backwards_callback PASSED [0.0103s] [  7%]
2025-12-04T13:44:40.1150067Z test_cuda.py::TestCuda::test_streaming_backwards_multiple_streams PASSED [0.0610s] [  8%]
2025-12-04T13:44:40.1150694Z test_cuda.py::TestCuda::test_streaming_backwards_sync PASSED [0.0109s]   [  9%]
2025-12-04T13:44:40.1151315Z test_cuda.py::TestCuda::test_streaming_backwards_sync_graph_root PASSED [0.2600s] [  9%]
2025-12-04T13:44:40.1161001Z test_cuda.py::TestCuda::test_streams PASSED [0.0019s]                    [ 10%]
2025-12-04T13:44:40.1161652Z test_cuda.py::TestCuda::test_sum_fp16 PASSED [0.0350s]                   [ 11%]
2025-12-04T13:44:40.1162209Z test_cuda.py::TestCuda::test_tiny_half_norm_ PASSED [0.0296s]            [ 11%]
2025-12-04T13:44:40.1162773Z test_cuda.py::TestCuda::test_to_cpu_blocking_by_default PASSED [0.1097s] [ 12%]
2025-12-04T13:44:40.1163386Z test_cuda.py::TestCuda::test_to_non_blocking PASSED [0.4351s]            [ 13%]
2025-12-04T13:44:40.1163947Z test_cuda.py::TestCuda::test_to_numpy PASSED [0.0019s]                   [ 14%]
2025-12-04T13:44:40.1164547Z test_cuda.py::TestCuda::test_torch_manual_seed_seeds_cuda_devices PASSED [0.0027s] [ 14%]
2025-12-04T13:44:40.1165191Z test_cuda.py::TestCuda::test_type_conversions PASSED [0.0025s]           [ 15%]
2025-12-04T13:44:40.1165775Z test_cuda.py::TestCuda::test_uuid PASSED [0.0013s]                       [ 16%]
2025-12-04T13:44:40.1166362Z test_cuda.py::TestCudaMallocAsync::test_allocator_backend PASSED [1.6906s] [ 16%]
2025-12-04T13:44:40.1166991Z test_cuda.py::TestCudaMallocAsync::test_allocator_fuzz PASSED [1.2409s]  [ 17%]
2025-12-04T13:44:40.1167688Z test_cuda.py::TestCudaMallocAsync::test_allocator_memory_fraction_setting PASSED [8.4148s] [ 18%]
2025-12-04T13:44:40.1168395Z test_cuda.py::TestCudaMallocAsync::test_allocator_settings PASSED [0.0060s] [ 19%]
2025-12-04T13:44:40.1169066Z test_cuda.py::TestCudaMallocAsync::test_cachingAllocator_raw_alloc PASSED [0.0108s] [ 19%]
2025-12-04T13:44:40.1169852Z test_cuda.py::TestCudaMallocAsync::test_clock_speed SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 20%]
2025-12-04T13:44:40.1170643Z test_cuda.py::TestCudaMallocAsync::test_cpp_memory_snapshot_pickle PASSED [12.2423s] [ 21%]
2025-12-04T13:44:40.1171651Z test_cuda.py::TestCudaMallocAsync::test_cycles W1204 13:44:07.887000 110856 site-packages/torch/utils/viz/_cycles.py:59] CUDA Memory changed during GC, 512 bytes freed.
2025-12-04T13:44:40.1172440Z PASSED [0.3256s]          [ 21%]
2025-12-04T13:44:40.1173025Z test_cuda.py::TestCudaMallocAsync::test_device_memory_used SKIPPED [0.0003s] (pynvml/amdsmi is not available) [ 22%]
2025-12-04T13:44:40.1173802Z test_cuda.py::TestCudaMallocAsync::test_direct_traceback PASSED [0.0019s] [ 23%]
2025-12-04T13:44:40.1174457Z test_cuda.py::TestCudaMallocAsync::test_garbage_collect_expandable PASSED [0.0061s] [ 23%]
2025-12-04T13:44:40.1175167Z test_cuda.py::TestCudaMallocAsync::test_max_split_expandable PASSED [0.0092s] [ 24%]
2025-12-04T13:44:40.1175857Z test_cuda.py::TestCudaMallocAsync::test_memory_compile_regions PASSED [3.2056s] [ 25%]
2025-12-04T13:44:40.1176498Z test_cuda.py::TestCudaMallocAsync::test_memory_plots PASSED [0.0659s]    [ 26%]
2025-12-04T13:44:40.1177219Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_segment_stack PASSED [0.0051s] [ 26%]
2025-12-04T13:44:40.1177932Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_stack PASSED [0.0048s] [ 27%]
2025-12-04T13:44:40.1178675Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_history_context PASSED [0.0017s] [ 28%]
2025-12-04T13:44:40.1179462Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_metadata PASSED [0.0023s] [ 28%]
2025-12-04T13:44:40.1180122Z test_cuda.py::TestCudaMallocAsync::test_memory_profiler_viz PASSED [0.0452s] [ 29%]
2025-12-04T13:44:40.1181959Z test_cuda.py::TestCudaMallocAsync::test_memory_snapshot SKIPPED [0.0007s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/126953 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 30%]
2025-12-04T13:44:40.1183784Z test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_script PASSED [0.0034s] [ 30%]
2025-12-04T13:44:40.1185762Z test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_with_cpp SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/137249 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 31%]
2025-12-04T13:44:40.1187578Z test_cuda.py::TestCudaMallocAsync::test_notifies_oom PASSED [0.0149s]    [ 32%]
2025-12-04T13:44:40.1188368Z test_cuda.py::TestCudaMallocAsync::test_nvml_get_handler SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 33%]
2025-12-04T13:44:40.1189242Z test_cuda.py::TestCudaMallocAsync::test_power_draw SKIPPED [0.0003s] (pynvml/amdsmi is not available) [ 33%]
2025-12-04T13:44:40.1190075Z test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_False PASSED [0.0146s] [ 34%]
2025-12-04T13:44:40.1190895Z test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_True PASSED [0.0152s] [ 35%]
2025-12-04T13:44:40.1191770Z test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_count SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 35%]
2025-12-04T13:44:40.1192737Z test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_uuids SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 36%]
2025-12-04T13:44:40.1193655Z test_cuda.py::TestCudaMallocAsync::test_temperature SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 37%]
2025-12-04T13:44:40.1194570Z test_cuda.py::TestCudaMallocAsync::test_uuid_visible_devices SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 38%]
2025-12-04T13:44:40.1195471Z test_cuda.py::TestBlockStateAbsorption::test_additional_free_following_checkpoint PASSED [0.2267s] [ 38%]
2025-12-04T13:44:40.1196285Z test_cuda.py::TestBlockStateAbsorption::test_allocate_in_thread_to_pool PASSED [0.3993s] [ 39%]
2025-12-04T13:44:40.1197074Z test_cuda.py::TestBlockStateAbsorption::test_allocated_in_middle_of_segment PASSED [0.1978s] [ 40%]
2025-12-04T13:44:40.1197907Z test_cuda.py::TestBlockStateAbsorption::test_assigning_back_deleter_fns_to_tensor PASSED [0.2191s] [ 40%]
2025-12-04T13:44:40.1198710Z test_cuda.py::TestBlockStateAbsorption::test_check_pool_live_allocations PASSED [0.1971s] [ 41%]
2025-12-04T13:44:40.1199501Z test_cuda.py::TestBlockStateAbsorption::test_middle_allocations_contiguous PASSED [0.1975s] [ 42%]
2025-12-04T13:44:40.1200287Z test_cuda.py::TestBlockStateAbsorption::test_multiple_middle_allocations PASSED [0.1983s] [ 42%]
2025-12-04T13:44:40.1201022Z test_cuda.py::TestBlockStateAbsorption::test_no_triton_on_import PASSED [2.0633s] [ 43%]
2025-12-04T13:44:40.1201693Z test_cuda.py::TestBlockStateAbsorption::test_resnet PASSED [0.6759s]     [ 44%]
2025-12-04T13:44:40.1202329Z test_cuda.py::TestBlockStateAbsorption::test_simple PASSED [0.2007s]     [ 45%]
2025-12-04T13:44:40.1203082Z test_cuda.py::TestBlockStateAbsorption::test_tensor_dies_after_checkpoint PASSED [0.1988s] [ 45%]
2025-12-04T13:44:40.1203791Z test_cuda.py::TestMemPool::test_graph_capture_reclaim_2_streams PASSED [0.0024s] [ 46%]
2025-12-04T13:44:40.1204497Z test_cuda.py::TestMemPool::test_graph_capture_reclaim_4_streams PASSED [0.0025s] [ 47%]
2025-12-04T13:44:40.1206366Z test_cuda.py::TestMemPool::test_mempool_ctx_multithread SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/153460 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 47%]
2025-12-04T13:44:40.1208465Z test_cuda.py::TestMemPool::test_mempool_empty_cache PASSED [0.0019s]     [ 48%]
2025-12-04T13:44:40.1210236Z test_cuda.py::TestMemPool::test_mempool_empty_cache_inactive SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/159663 for platform(s) linux, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 49%]
2025-12-04T13:44:40.1212029Z test_cuda.py::TestMemPool::test_mempool_emptycache_multithread PASSED [0.0035s] [ 50%]
2025-12-04T13:44:40.1214676Z test_cuda.py::TestMemPool::test_mempool_expandable [1/2] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=dummy_allocator -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /opt/conda/envs/py_3.10/include/python3.10 -fPIC -std=c++17 -c /var/lib/jenkins/.cache/torch_extensions/py310_cu128/dummy_allocator/main.cpp -o main.o 
2025-12-04T13:44:40.1217795Z [2/2] c++ main.o -shared -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o dummy_allocator.so
2025-12-04T13:44:40.1218766Z PASSED [2.0904s]      [ 50%]
2025-12-04T13:44:40.1219230Z test_cuda.py::TestMemPool::test_mempool_id PASSED [0.0012s]              [ 51%]
2025-12-04T13:44:40.1221045Z test_cuda.py::TestMemPool::test_mempool_limited_memory_with_allocator SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157256 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 52%]
2025-12-04T13:44:40.1222843Z test_cuda.py::TestMemPool::test_mempool_multithread PASSED [0.0018s]     [ 52%]
2025-12-04T13:44:40.1224583Z test_cuda.py::TestMemPool::test_mempool_with_allocator SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/154566 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 53%]
2025-12-04T13:44:40.1226319Z test_cuda.py::TestMemPool::test_nested_mempool PASSED [0.0026s]          [ 54%]
2025-12-04T13:44:40.1227068Z test_cuda.py::TestGDS::test_gds_read_write_tensors SKIPPED [0.0002s] (Disabling as USE_CUFILE=0 by default in builds) [ 54%]
2025-12-04T13:44:40.1227826Z test_cuda.py::TestCudaAutocast::test_autocast_banned PASSED [0.0080s]    [ 55%]
2025-12-04T13:44:40.1228512Z test_cuda.py::TestCudaAutocast::test_autocast_cache_leak PASSED [0.1341s] [ 56%]
2025-12-04T13:44:40.1229334Z test_cuda.py::TestCudaAutocast::test_autocast_cat_jit PASSED [0.0072s]   [ 57%]
2025-12-04T13:44:40.1230041Z test_cuda.py::TestCudaAutocast::test_autocast_checkpointing PASSED [0.0095s] [ 57%]
2025-12-04T13:44:40.1230701Z test_cuda.py::TestCudaAutocast::test_autocast_custom_cast_inputs PASSED [0.0079s] [ 58%]
2025-12-04T13:44:40.1231513Z test_cuda.py::TestCudaAutocast::test_autocast_custom_deprecated_warning PASSED [0.0039s] [ 59%]
2025-12-04T13:44:40.1232220Z test_cuda.py::TestCudaAutocast::test_autocast_custom_enabled PASSED [0.1212s] [ 59%]
2025-12-04T13:44:40.1232925Z test_cuda.py::TestCudaAutocast::test_autocast_ignored_types PASSED [0.0534s] [ 60%]
2025-12-04T13:44:40.1233558Z test_cuda.py::TestCudaAutocast::test_autocast_linalg_fp16 PASSED [0.0040s] [ 61%]
2025-12-04T13:44:40.1234260Z test_cuda.py::TestCudaAutocast::test_autocast_methods_expect_builtin_promote PASSED [0.0047s] [ 61%]
2025-12-04T13:44:40.1234973Z test_cuda.py::TestCudaAutocast::test_autocast_methods_fp16 PASSED [0.0032s] [ 62%]
2025-12-04T13:44:40.1235667Z test_cuda.py::TestCudaAutocast::test_autocast_methods_fp32 PASSED [0.0035s] [ 63%]
2025-12-04T13:44:40.1236280Z test_cuda.py::TestCudaAutocast::test_autocast_nn_bf16 PASSED [0.0033s]   [ 64%]
2025-12-04T13:44:40.1236887Z test_cuda.py::TestCudaAutocast::test_autocast_nn_fp16 PASSED [0.0032s]   [ 64%]
2025-12-04T13:44:40.1237495Z test_cuda.py::TestCudaAutocast::test_autocast_nn_fp32 PASSED [0.0548s]   [ 65%]
2025-12-04T13:44:40.1238093Z test_cuda.py::TestCudaAutocast::test_autocast_rnn PASSED [10.2264s]      [ 66%]
2025-12-04T13:44:40.1238709Z test_cuda.py::TestCudaAutocast::test_autocast_torch_bf16 PASSED [0.0653s] [ 66%]
2025-12-04T13:44:40.1239448Z test_cuda.py::TestCudaAutocast::test_autocast_torch_expect_builtin_promote PASSED [0.0056s] [ 67%]
2025-12-04T13:44:40.1240139Z test_cuda.py::TestCudaAutocast::test_autocast_torch_fp16 PASSED [0.0625s] [ 68%]
2025-12-04T13:44:40.1240744Z test_cuda.py::TestCudaAutocast::test_autocast_torch_fp32 PASSED [0.8243s] [ 69%]
2025-12-04T13:44:40.1241475Z test_cuda.py::TestCudaAutocast::test_autocast_torch_need_autocast_promote PASSED [0.0468s] [ 69%]
2025-12-04T13:44:40.1242211Z test_cuda.py::TestCudaAutocast::test_cuda_autocast_deprecated_warning PASSED [0.0097s] [ 70%]
2025-12-04T13:44:40.1242876Z test_cuda.py::TestCompileKernel::test_compile_kernel PASSED [0.1258s]    [ 71%]
2025-12-04T13:44:40.1243512Z test_cuda.py::TestCompileKernel::test_compile_kernel_advanced PASSED [0.1881s] [ 71%]
2025-12-04T13:44:40.1244193Z test_cuda.py::TestCompileKernel::test_compile_kernel_as_custom_op PASSED [0.0376s] [ 72%]
2025-12-04T13:44:40.1244879Z test_cuda.py::TestCompileKernel::test_compile_kernel_cuda_headers PASSED [0.0447s] [ 73%]
2025-12-04T13:44:40.1245599Z test_cuda.py::TestCompileKernel::test_compile_kernel_custom_op_validation PASSED [0.1875s] [ 73%]
2025-12-04T13:44:40.1246295Z test_cuda.py::TestCompileKernel::test_compile_kernel_dlpack PASSED [0.0298s] [ 74%]
2025-12-04T13:44:40.1246982Z test_cuda.py::TestCompileKernel::test_compile_kernel_double_precision PASSED [0.0351s] [ 75%]
2025-12-04T13:44:40.1247733Z test_cuda.py::TestCompileKernel::test_compile_kernel_large_shared_memory PASSED [0.0417s] [ 76%]
2025-12-04T13:44:40.1248431Z test_cuda.py::TestCompileKernel::test_compile_kernel_template PASSED [0.0709s] [ 76%]
2025-12-04T13:44:40.1249136Z test_cuda.py::TestFXMemoryProfiler::test_fx_memory_profiler_augmentation PASSED [0.2687s] [ 77%]
2025-12-04T13:44:40.1250056Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_AdamW_cuda_float32 PASSED [0.2208s] [ 78%]
2025-12-04T13:44:40.1251135Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_Adam_cuda_float32 PASSED [0.0038s] [ 78%]
2025-12-04T13:44:40.1252203Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_SGD_cuda_float32 PASSED [0.0030s] [ 79%]
2025-12-04T13:44:40.1253276Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_AdamW_cuda_float32 PASSED [0.0030s] [ 80%]
2025-12-04T13:44:40.1254432Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_Adam_cuda_float32 PASSED [0.0028s] [ 80%]
2025-12-04T13:44:40.1255540Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_SGD_cuda_float32 PASSED [0.0027s] [ 81%]
2025-12-04T13:44:40.1256854Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32 SKIPPED [0.0013s] (cuda is not supported for fused on Adagrad) [ 82%]
2025-12-04T13:44:40.1258013Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32 PASSED [0.9212s] [ 83%]
2025-12-04T13:44:40.1258969Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32 PASSED [0.8974s] [ 83%]
2025-12-04T13:44:40.1259988Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32 PASSED [0.4747s] [ 84%]
2025-12-04T13:44:40.1260957Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_AdamW_cuda_float32 PASSED [0.0091s] [ 85%]
2025-12-04T13:44:40.1261950Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_Adam_cuda_float32 PASSED [0.0064s] [ 85%]
2025-12-04T13:44:40.1262946Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_SGD_cuda_float32 PASSED [0.0062s] [ 86%]
2025-12-04T13:44:40.1263931Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_AdamW_cuda_float32 PASSED [0.0066s] [ 87%]
2025-12-04T13:44:40.1264906Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_Adam_cuda_float32 PASSED [0.0066s] [ 88%]
2025-12-04T13:44:40.1265973Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_SGD_cuda_float32 PASSED [0.0064s] [ 88%]
2025-12-04T13:44:40.1266951Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_AdamW_cuda_float32 PASSED [0.0363s] [ 89%]
2025-12-04T13:44:40.1267983Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_Adam_cuda_float32 PASSED [0.0071s] [ 90%]
2025-12-04T13:44:40.1268944Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_SGD_cuda_float32 PASSED [0.0061s] [ 90%]
2025-12-04T13:44:40.1269804Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_ASGD_cuda_float32 PASSED [0.3956s] [ 91%]
2025-12-04T13:44:40.1270555Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adadelta_cuda_float32 PASSED [0.2056s] [ 92%]
2025-12-04T13:44:40.1271308Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_AdamW_cuda_float32 PASSED [0.3401s] [ 92%]
2025-12-04T13:44:40.1272030Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adam_cuda_float32 PASSED [0.3369s] [ 93%]
2025-12-04T13:44:40.1272762Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adamax_cuda_float32 PASSED [0.2239s] [ 94%]
2025-12-04T13:44:40.1273507Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_NAdam_cuda_float32 PASSED [0.3925s] [ 95%]
2025-12-04T13:44:40.1274241Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RAdam_cuda_float32 PASSED [0.3729s] [ 95%]
2025-12-04T13:44:40.1274975Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RMSprop_cuda_float32 PASSED [0.2100s] [ 96%]
2025-12-04T13:44:40.1275719Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Rprop_cuda_float32 PASSED [0.3199s] [ 97%]
2025-12-04T13:44:40.1276538Z test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_AdamW_cuda_float32 PASSED [0.1081s] [ 97%]
2025-12-04T13:44:40.1277423Z test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_Adam_cuda_float32 PASSED [0.1071s] [ 98%]
2025-12-04T13:44:40.1278297Z test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_SGD_cuda_float32 PASSED [0.0444s] [ 99%]
2025-12-04T13:44:40.1279195Z test_cuda.py::TestCudaDeviceParametrizedCUDA::test_graph_external_wait_and_record_cuda PASSED [1.0415s] [100%]
2025-12-04T13:44:40.1279714Z 
2025-12-04T13:44:40.1280196Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml -
2025-12-04T13:44:40.1280972Z =============== 125 passed, 17 skipped, 110 deselected in 55.06s ===============
2025-12-04T13:44:40.1281680Z The following tests failed consistently: ['test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view']
2025-12-04T13:44:40.1282167Z 
2025-12-04T13:44:40.1282525Z FINISHED PRINTING LOG FILE of test_cuda 1/1 (test/test-reports/test_cuda_1.1_5ed6ed395e86485d_.log)
2025-12-04T13:44:40.1282958Z 
2025-12-04T13:44:40.1283178Z Finished test_cuda 1/1 ... [2025-12-04 13:44:40.074490][16352.0837123], took 181.29min
2025-12-04T13:44:40.1284033Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml
2025-12-04T13:44:40.6792493Z Uploading logs for 57118183212 to S3
2025-12-04T13:44:40.8626392Z Uploading artifacts took 0.67 seconds
2025-12-04T13:44:40.8626803Z test_cuda 1/1 failed!
2025-12-04T13:44:40.8630793Z Running test_sparse 1/1 ... [2025-12-04 13:44:40.862766][16352.871989054]
2025-12-04T13:44:40.8631367Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:44:40.8635176Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:40.863158]
2025-12-04T14:00:07.5784017Z 
2025-12-04T14:00:07.5784932Z PRINTING LOG FILE of test_sparse 1/1 (test/test-reports/test_sparse_1.1_e217f60a40d48402_.log)
2025-12-04T14:00:07.5786081Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml
2025-12-04T14:00:07.5786699Z ============================= test session starts ==============================
2025-12-04T14:00:07.5787424Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.5788213Z cachedir: .pytest_cache
2025-12-04T14:00:07.5788827Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.5789504Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.5789791Z configfile: pytest.ini
2025-12-04T14:00:07.5790412Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.5791094Z collecting ... collected 3100 items
2025-12-04T14:00:07.5791447Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T14:00:07.6946896Z Running 3100 items in this shard: test/test_sparse.py::TestSparseLegacyAndDeprecation::test_legacy_warnings, test/test_sparse.py::TestSparseOneOff::test_cuda_from_cpu, test/test_sparse.py::TestSparseOneOff::test_cuda_sparse_cpu_dense_add, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSR_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_float64, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_any_cuda, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_assign_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_basic_ops_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_deterministic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_oob_cuda, test/test_sparse.py::TestSparseCUDA::test_bmm_windows_error_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_reference_cycle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_transpose_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_is_coalesced_with_gradcheck_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_large_sizes_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cuda_empty_cuda, test/test_sparse.py::TestSparseCUDA::test_div_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dtypes_cuda, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_copy_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_device_type_inference_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_empty_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_floor_divide_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_hsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_is_nonzero_cuda, test/test_sparse.py::TestSparseCUDA::test_is_sparse_cuda, test/test_sparse.py::TestSparseCUDA::test_isnan_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_device_cuda, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_log_softmax_float_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mv_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_negative_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_new_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_new_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_new_device_multi_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_new_device_single_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_pickle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_coalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_uncoalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_resize_as_cuda, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_same_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_small_nnz_coalesced_cuda, test/test_sparse.py::TestSparseCUDA::test_softmax_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_spadd_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_out_bfloat16_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_to_numpy_cuda, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_storage_not_null_cuda, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_generate_simple_inputs_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_invalid_blocksize_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_Strided_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_Strided_cuda
2025-12-04T14:00:07.8105443Z 
2025-12-04T14:00:07.8105833Z test_sparse.py::TestSparseLegacyAndDeprecation::test_legacy_warnings PASSED [0.0225s] [  0%]
2025-12-04T14:00:07.8106556Z test_sparse.py::TestSparseOneOff::test_cuda_from_cpu PASSED [0.0259s]    [  0%]
2025-12-04T14:00:07.8107205Z test_sparse.py::TestSparseOneOff::test_cuda_sparse_cpu_dense_add PASSED [0.0020s] [  0%]
2025-12-04T14:00:07.8108057Z test_sparse.py::TestSparseMeta::test_add_meta_SparseBSC_float64 PASSED [0.0655s] [  0%]
2025-12-04T14:00:07.8108751Z test_sparse.py::TestSparseMeta::test_add_meta_SparseBSR_float64 PASSED [0.0628s] [  0%]
2025-12-04T14:00:07.8109440Z test_sparse.py::TestSparseMeta::test_add_meta_SparseCOO_float64 PASSED [0.0418s] [  0%]
2025-12-04T14:00:07.8110382Z test_sparse.py::TestSparseMeta::test_add_meta_SparseCSC_float64 PASSED [0.0579s] [  0%]
2025-12-04T14:00:07.8111043Z test_sparse.py::TestSparseMeta::test_add_meta_SparseCSR_float64 PASSED [0.0579s] [  0%]
2025-12-04T14:00:07.8111702Z test_sparse.py::TestSparseMeta::test_fake_SparseBSC_float64 PASSED [0.3204s] [  0%]
2025-12-04T14:00:07.8112439Z test_sparse.py::TestSparseMeta::test_fake_SparseBSR_float64 PASSED [0.3130s] [  0%]
2025-12-04T14:00:07.8113074Z test_sparse.py::TestSparseMeta::test_fake_SparseCOO_float64 PASSED [0.1960s] [  0%]
2025-12-04T14:00:07.8113715Z test_sparse.py::TestSparseMeta::test_fake_SparseCSC_float64 PASSED [0.3046s] [  0%]
2025-12-04T14:00:07.8114347Z test_sparse.py::TestSparseMeta::test_fake_SparseCSR_float64 PASSED [0.3060s] [  0%]
2025-12-04T14:00:07.8114984Z test_sparse.py::TestSparseMeta::test_meta_SparseBSC_float64 PASSED [0.0027s] [  0%]
2025-12-04T14:00:07.8115610Z test_sparse.py::TestSparseMeta::test_meta_SparseBSR_float64 PASSED [0.0024s] [  0%]
2025-12-04T14:00:07.8116241Z test_sparse.py::TestSparseMeta::test_meta_SparseCOO_float64 PASSED [0.0016s] [  0%]
2025-12-04T14:00:07.8116873Z test_sparse.py::TestSparseMeta::test_meta_SparseCSC_float64 PASSED [0.0023s] [  0%]
2025-12-04T14:00:07.8117505Z test_sparse.py::TestSparseMeta::test_meta_SparseCSR_float64 PASSED [0.0023s] [  0%]
2025-12-04T14:00:07.8118172Z test_sparse.py::TestSparseMeta::test_print_meta_SparseBSC_float64 PASSED [0.0020s] [  0%]
2025-12-04T14:00:07.8118855Z test_sparse.py::TestSparseMeta::test_print_meta_SparseBSR_float64 PASSED [0.0014s] [  0%]
2025-12-04T14:00:07.8119531Z test_sparse.py::TestSparseMeta::test_print_meta_SparseCOO_float64 PASSED [0.0013s] [  0%]
2025-12-04T14:00:07.8120210Z test_sparse.py::TestSparseMeta::test_print_meta_SparseCSC_float64 PASSED [0.0013s] [  0%]
2025-12-04T14:00:07.8120892Z test_sparse.py::TestSparseMeta::test_print_meta_SparseCSR_float64 PASSED [0.0013s] [  0%]
2025-12-04T14:00:07.8121562Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSC_float64 PASSED [0.0302s] [  0%]
2025-12-04T14:00:07.8122225Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSR_float64 PASSED [0.0298s] [  0%]
2025-12-04T14:00:07.8123003Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseCOO_float64 PASSED [0.0272s] [  0%]
2025-12-04T14:00:07.8123661Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSC_float64 PASSED [0.0279s] [  0%]
2025-12-04T14:00:07.8124319Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSR_float64 PASSED [0.0278s] [  0%]
2025-12-04T14:00:07.8124991Z test_sparse.py::TestSparseMeta::test_to_meta_SparseBSC_float64 PASSED [0.0540s] [  0%]
2025-12-04T14:00:07.8125779Z test_sparse.py::TestSparseMeta::test_to_meta_SparseBSR_float64 PASSED [0.0537s] [  0%]
2025-12-04T14:00:07.8126433Z test_sparse.py::TestSparseMeta::test_to_meta_SparseCOO_float64 PASSED [0.0407s] [  1%]
2025-12-04T14:00:07.8127169Z test_sparse.py::TestSparseMeta::test_to_meta_SparseCSC_float64 PASSED [0.0495s] [  1%]
2025-12-04T14:00:07.8127839Z test_sparse.py::TestSparseMeta::test_to_meta_SparseCSR_float64 PASSED [0.0503s] [  1%]
2025-12-04T14:00:07.8128532Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSC_float64 PASSED [0.0799s] [  1%]
2025-12-04T14:00:07.8129257Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSR_float64 PASSED [0.0795s] [  1%]
2025-12-04T14:00:07.8129981Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCOO_float64 PASSED [0.0762s] [  1%]
2025-12-04T14:00:07.8130702Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSC_float64 PASSED [0.0752s] [  1%]
2025-12-04T14:00:07.8131428Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSR_float64 PASSED [0.0754s] [  1%]
2025-12-04T14:00:07.8132166Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSC_float64 PASSED [0.0540s] [  1%]
2025-12-04T14:00:07.8132888Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSR_float64 PASSED [0.0539s] [  1%]
2025-12-04T14:00:07.8134000Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCOO_float64 PASSED [0.0414s] [  1%]
2025-12-04T14:00:07.8135045Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSC_float64 PASSED [0.0500s] [  1%]
2025-12-04T14:00:07.8137250Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSR_float64 PASSED [0.0500s] [  1%]
2025-12-04T14:00:07.8138449Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex128 SKIPPED [0.0002s] (In-place abs not supported for complex tensors) [  1%]
2025-12-04T14:00:07.8139742Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex64 SKIPPED [0.0002s] (In-place abs not supported for complex tensors) [  1%]
2025-12-04T14:00:07.8140738Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float32 PASSED [0.1651s] [  1%]
2025-12-04T14:00:07.8141521Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float64 PASSED [0.1708s] [  1%]
2025-12-04T14:00:07.8142298Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int16 PASSED [0.0120s] [  1%]
2025-12-04T14:00:07.8143074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int32 PASSED [0.1643s] [  1%]
2025-12-04T14:00:07.8143846Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int64 PASSED [0.0072s] [  1%]
2025-12-04T14:00:07.8144600Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int8 PASSED [0.1636s] [  1%]
2025-12-04T14:00:07.8145375Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_uint8 PASSED [0.0073s] [  1%]
2025-12-04T14:00:07.8146168Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex128 PASSED [0.1871s] [  1%]
2025-12-04T14:00:07.8146994Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex64 PASSED [0.0094s] [  1%]
2025-12-04T14:00:07.8147794Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float32 PASSED [0.1639s] [  1%]
2025-12-04T14:00:07.8148586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float64 PASSED [0.0074s] [  1%]
2025-12-04T14:00:07.8149377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int16 PASSED [0.1635s] [  1%]
2025-12-04T14:00:07.8150165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int32 PASSED [0.0070s] [  1%]
2025-12-04T14:00:07.8150935Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int64 PASSED [0.1642s] [  1%]
2025-12-04T14:00:07.8151720Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int8 PASSED [0.0070s] [  1%]
2025-12-04T14:00:07.8152498Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_uint8 PASSED [0.0062s] [  1%]
2025-12-04T14:00:07.8153350Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex128 PASSED [0.1709s] [  2%]
2025-12-04T14:00:07.8154172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex64 PASSED [0.0077s] [  2%]
2025-12-04T14:00:07.8155024Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float32 PASSED [0.1639s] [  2%]
2025-12-04T14:00:07.8155827Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float64 PASSED [0.0073s] [  2%]
2025-12-04T14:00:07.8156612Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int16 PASSED [0.1637s] [  2%]
2025-12-04T14:00:07.8157391Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int32 PASSED [0.0070s] [  2%]
2025-12-04T14:00:07.8158172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int64 PASSED [0.1635s] [  2%]
2025-12-04T14:00:07.8158944Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int8 PASSED [0.0070s] [  2%]
2025-12-04T14:00:07.8159721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_uint8 PASSED [0.1639s] [  2%]
2025-12-04T14:00:07.8160523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex128 PASSED [0.5190s] [  2%]
2025-12-04T14:00:07.8161332Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex64 PASSED [0.8664s] [  2%]
2025-12-04T14:00:07.8162173Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float32 PASSED [0.0079s] [  2%]
2025-12-04T14:00:07.8162957Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float64 PASSED [0.1641s] [  2%]
2025-12-04T14:00:07.8163735Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int16 PASSED [0.0071s] [  2%]
2025-12-04T14:00:07.8164550Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int32 PASSED [0.1635s] [  2%]
2025-12-04T14:00:07.8165316Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int64 PASSED [0.0070s] [  2%]
2025-12-04T14:00:07.8166085Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int8 PASSED [0.1626s] [  2%]
2025-12-04T14:00:07.8166854Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_uint8 PASSED [0.0070s] [  2%]
2025-12-04T14:00:07.8167665Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex128 PASSED [0.6595s] [  2%]
2025-12-04T14:00:07.8168480Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex64 PASSED [0.7028s] [  2%]
2025-12-04T14:00:07.8169291Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float32 PASSED [0.1653s] [  2%]
2025-12-04T14:00:07.8170084Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float64 PASSED [0.0074s] [  2%]
2025-12-04T14:00:07.8170880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int16 PASSED [0.1640s] [  2%]
2025-12-04T14:00:07.8171659Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int32 PASSED [0.0070s] [  2%]
2025-12-04T14:00:07.8172440Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int64 PASSED [0.1638s] [  2%]
2025-12-04T14:00:07.8173229Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int8 PASSED [0.0070s] [  2%]
2025-12-04T14:00:07.8174018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_uint8 PASSED [0.0062s] [  2%]
2025-12-04T14:00:07.8174800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float32 PASSED [0.1700s] [  2%]
2025-12-04T14:00:07.8175593Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float64 PASSED [0.0073s] [  2%]
2025-12-04T14:00:07.8176374Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int16 PASSED [0.1639s] [  2%]
2025-12-04T14:00:07.8177150Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int32 PASSED [0.0071s] [  2%]
2025-12-04T14:00:07.8177934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int64 PASSED [0.1639s] [  3%]
2025-12-04T14:00:07.8178708Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int8 PASSED [0.0071s] [  3%]
2025-12-04T14:00:07.8179584Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_uint8 PASSED [0.1639s] [  3%]
2025-12-04T14:00:07.8180551Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex128 SKIPPED [0.0028s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8181638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex64 SKIPPED [0.0027s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8182705Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float32 SKIPPED [0.0026s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8183758Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8184792Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int16 SKIPPED [0.0025s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8185826Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8186862Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int64 SKIPPED [0.0028s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8187890Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8188955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [  3%]
2025-12-04T14:00:07.8190134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex128 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8191523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex64 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8192859Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float32 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8194183Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float64 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8195471Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int16 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8196760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int32 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8198049Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int64 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8199400Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int8 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8200687Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_uint8 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [  3%]
2025-12-04T14:00:07.8201764Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float32 PASSED [0.1647s] [  3%]
2025-12-04T14:00:07.8202608Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float64 PASSED [0.0073s] [  3%]
2025-12-04T14:00:07.8203428Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int16 PASSED [0.1640s] [  3%]
2025-12-04T14:00:07.8204237Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int32 PASSED [0.0070s] [  3%]
2025-12-04T14:00:07.8205056Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int64 PASSED [0.1639s] [  3%]
2025-12-04T14:00:07.8205869Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int8 PASSED [0.0070s] [  3%]
2025-12-04T14:00:07.8206716Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_uint8 PASSED [0.1638s] [  3%]
2025-12-04T14:00:07.8207517Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float32 PASSED [0.0103s] [  3%]
2025-12-04T14:00:07.8208687Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float64 PASSED [0.1648s] [  3%]
2025-12-04T14:00:07.8209491Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int16 PASSED [0.0070s] [  3%]
2025-12-04T14:00:07.8210300Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int32 PASSED [0.1639s] [  4%]
2025-12-04T14:00:07.8211094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int64 PASSED [0.0070s] [  4%]
2025-12-04T14:00:07.8211881Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int8 PASSED [0.1634s] [  4%]
2025-12-04T14:00:07.8212658Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_uint8 PASSED [0.0070s] [  4%]
2025-12-04T14:00:07.8213448Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float32 PASSED [0.1644s] [  4%]
2025-12-04T14:00:07.8214268Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float64 PASSED [0.2153s] [  4%]
2025-12-04T14:00:07.8215085Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int16 PASSED [0.3799s] [  4%]
2025-12-04T14:00:07.8215900Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int32 PASSED [0.0070s] [  4%]
2025-12-04T14:00:07.8216764Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int64 PASSED [0.1638s] [  4%]
2025-12-04T14:00:07.8217583Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int8 PASSED [0.0070s] [  4%]
2025-12-04T14:00:07.8218485Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_uint8 PASSED [0.0062s] [  4%]
2025-12-04T14:00:07.8219383Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex128 PASSED [0.1772s] [  4%]
2025-12-04T14:00:07.8220209Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex64 PASSED [0.0075s] [  4%]
2025-12-04T14:00:07.8221029Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float32 PASSED [0.1639s] [  4%]
2025-12-04T14:00:07.8221863Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float64 PASSED [0.0074s] [  4%]
2025-12-04T14:00:07.8222646Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int16 PASSED [0.1640s] [  4%]
2025-12-04T14:00:07.8223464Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int32 PASSED [0.0071s] [  4%]
2025-12-04T14:00:07.8224262Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int64 PASSED [0.1635s] [  4%]
2025-12-04T14:00:07.8225077Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int8 PASSED [0.0070s] [  4%]
2025-12-04T14:00:07.8225875Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_uint8 PASSED [0.1646s] [  4%]
2025-12-04T14:00:07.8226674Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float32 PASSED [0.0074s] [  4%]
2025-12-04T14:00:07.8227488Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float64 PASSED [0.1639s] [  4%]
2025-12-04T14:00:07.8228302Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int16 PASSED [0.0071s] [  4%]
2025-12-04T14:00:07.8229093Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int32 PASSED [0.1633s] [  4%]
2025-12-04T14:00:07.8229904Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int64 PASSED [0.0071s] [  4%]
2025-12-04T14:00:07.8230684Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int8 PASSED [0.1637s] [  4%]
2025-12-04T14:00:07.8231469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_uint8 PASSED [0.0070s] [  4%]
2025-12-04T14:00:07.8232281Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float32 PASSED [0.1642s] [  4%]
2025-12-04T14:00:07.8233095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float64 PASSED [0.0074s] [  4%]
2025-12-04T14:00:07.8234123Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [  4%]
2025-12-04T14:00:07.8235252Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex64 SKIPPED [0.0028s] (Skipped! Out not supported) [  4%]
2025-12-04T14:00:07.8236339Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8237412Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8238505Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int16 SKIPPED [0.0028s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8239588Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8240638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8241682Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8242742Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8243865Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8244964Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex64 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8246080Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8247148Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float64 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8248216Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8249265Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8250338Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8251389Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8252445Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_uint8 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8253514Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8254613Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float64 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8255706Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8256789Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int32 SKIPPED [0.0029s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8257865Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8259000Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8260132Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8261213Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8262361Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8263484Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8264570Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8265652Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8266730Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8267811Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [  5%]
2025-12-04T14:00:07.8268812Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex128 PASSED [0.1708s] [  5%]
2025-12-04T14:00:07.8269664Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex64 PASSED [0.0075s] [  6%]
2025-12-04T14:00:07.8270485Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float32 PASSED [0.1646s] [  6%]
2025-12-04T14:00:07.8271351Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float64 PASSED [0.0073s] [  6%]
2025-12-04T14:00:07.8272175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int16 PASSED [0.1644s] [  6%]
2025-12-04T14:00:07.8272972Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int32 PASSED [0.0070s] [  6%]
2025-12-04T14:00:07.8273806Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int64 PASSED [0.1641s] [  6%]
2025-12-04T14:00:07.8274604Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int8 PASSED [0.0070s] [  6%]
2025-12-04T14:00:07.8275401Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_uint8 PASSED [0.1640s] [  6%]
2025-12-04T14:00:07.8276233Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float32 PASSED [0.0074s] [  6%]
2025-12-04T14:00:07.8277081Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float64 PASSED [0.1648s] [  6%]
2025-12-04T14:00:07.8277927Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int16 PASSED [0.0070s] [  6%]
2025-12-04T14:00:07.8278800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int32 PASSED [0.1643s] [  6%]
2025-12-04T14:00:07.8279653Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int64 PASSED [0.0071s] [  6%]
2025-12-04T14:00:07.8280470Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int8 PASSED [0.1644s] [  6%]
2025-12-04T14:00:07.8281293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_uint8 PASSED [0.0071s] [  6%]
2025-12-04T14:00:07.8282124Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex128 PASSED [0.2274s] [  6%]
2025-12-04T14:00:07.8282931Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex64 PASSED [0.0728s] [  6%]
2025-12-04T14:00:07.8283735Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float32 PASSED [0.1730s] [  6%]
2025-12-04T14:00:07.8284531Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float64 PASSED [0.0074s] [  6%]
2025-12-04T14:00:07.8285335Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int16 PASSED [0.1661s] [  6%]
2025-12-04T14:00:07.8286111Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int32 PASSED [0.0071s] [  6%]
2025-12-04T14:00:07.8286896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int64 PASSED [0.1642s] [  6%]
2025-12-04T14:00:07.8287681Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int8 PASSED [0.0072s] [  6%]
2025-12-04T14:00:07.8288530Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_uint8 PASSED [0.1644s] [  6%]
2025-12-04T14:00:07.8289548Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8290737Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8291911Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8293078Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int32 SKIPPED [0.0027s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8294229Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8295387Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8296543Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [  6%]
2025-12-04T14:00:07.8297703Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex128 SKIPPED [0.0025s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8298915Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex64 SKIPPED [0.0025s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8300075Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8301219Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float64 SKIPPED [0.0025s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8302309Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8303391Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8304472Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8305558Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8306639Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [  7%]
2025-12-04T14:00:07.8307585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float32 PASSED [0.1643s] [  7%]
2025-12-04T14:00:07.8308561Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float64 PASSED [0.0073s] [  7%]
2025-12-04T14:00:07.8309423Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int16 PASSED [0.1641s] [  7%]
2025-12-04T14:00:07.8310294Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int32 PASSED [0.0070s] [  7%]
2025-12-04T14:00:07.8311152Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int64 PASSED [0.1645s] [  7%]
2025-12-04T14:00:07.8312006Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int8 PASSED [0.0070s] [  7%]
2025-12-04T14:00:07.8312877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_uint8 PASSED [0.1644s] [  7%]
2025-12-04T14:00:07.8313743Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float32 PASSED [0.0074s] [  7%]
2025-12-04T14:00:07.8314586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float64 PASSED [0.1650s] [  7%]
2025-12-04T14:00:07.8315442Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int16 PASSED [0.0070s] [  7%]
2025-12-04T14:00:07.8316293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int32 PASSED [0.1643s] [  7%]
2025-12-04T14:00:07.8317166Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int64 PASSED [0.0071s] [  7%]
2025-12-04T14:00:07.8318020Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int8 PASSED [0.1642s] [  7%]
2025-12-04T14:00:07.8318836Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_uint8 PASSED [0.0070s] [  7%]
2025-12-04T14:00:07.8319668Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex128 PASSED [0.2760s] [  7%]
2025-12-04T14:00:07.8320492Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex64 PASSED [0.1459s] [  7%]
2025-12-04T14:00:07.8321285Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float32 PASSED [0.1656s] [  7%]
2025-12-04T14:00:07.8322098Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float64 PASSED [0.0073s] [  7%]
2025-12-04T14:00:07.8322880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int16 PASSED [0.1650s] [  7%]
2025-12-04T14:00:07.8323667Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int32 PASSED [0.0071s] [  7%]
2025-12-04T14:00:07.8324444Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int64 PASSED [0.1647s] [  7%]
2025-12-04T14:00:07.8325222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int8 PASSED [0.0071s] [  7%]
2025-12-04T14:00:07.8326097Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_uint8 PASSED [0.1649s] [  8%]
2025-12-04T14:00:07.8326880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float32 PASSED [0.0074s] [  8%]
2025-12-04T14:00:07.8327736Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float64 PASSED [0.1653s] [  8%]
2025-12-04T14:00:07.8328544Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int16 PASSED [0.0071s] [  8%]
2025-12-04T14:00:07.8329340Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int32 PASSED [0.1649s] [  8%]
2025-12-04T14:00:07.8330131Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int64 PASSED [0.0071s] [  8%]
2025-12-04T14:00:07.8330919Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int8 PASSED [0.1650s] [  8%]
2025-12-04T14:00:07.8331708Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_uint8 PASSED [0.0070s] [  8%]
2025-12-04T14:00:07.8332648Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float32 SKIPPED [0.0026s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8346786Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8347909Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8349005Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8350046Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int64 SKIPPED [0.0028s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8351080Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8352119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [  8%]
2025-12-04T14:00:07.8353040Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex128 PASSED [0.4415s] [  8%]
2025-12-04T14:00:07.8353833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex64 PASSED [0.5103s] [  8%]
2025-12-04T14:00:07.8354608Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float32 PASSED [0.1665s] [  8%]
2025-12-04T14:00:07.8355376Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float64 PASSED [0.0073s] [  8%]
2025-12-04T14:00:07.8356222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int16 PASSED [0.1649s] [  8%]
2025-12-04T14:00:07.8356973Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int32 PASSED [0.0070s] [  8%]
2025-12-04T14:00:07.8357765Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int64 PASSED [0.1650s] [  8%]
2025-12-04T14:00:07.8358538Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int8 PASSED [0.0070s] [  8%]
2025-12-04T14:00:07.8359324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_uint8 PASSED [0.1649s] [  8%]
2025-12-04T14:00:07.8360098Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex128 PASSED [0.2794s] [  8%]
2025-12-04T14:00:07.8360895Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex64 PASSED [0.6642s] [  8%]
2025-12-04T14:00:07.8361679Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float32 PASSED [0.0080s] [  8%]
2025-12-04T14:00:07.8362459Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float64 PASSED [0.1654s] [  8%]
2025-12-04T14:00:07.8363225Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int16 PASSED [0.0070s] [  8%]
2025-12-04T14:00:07.8363987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int32 PASSED [0.1651s] [  8%]
2025-12-04T14:00:07.8364744Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int64 PASSED [0.0070s] [  8%]
2025-12-04T14:00:07.8365541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int8 PASSED [0.1652s] [  9%]
2025-12-04T14:00:07.8366291Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_uint8 PASSED [0.0070s] [  9%]
2025-12-04T14:00:07.8367179Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex128 PASSED [0.4947s] [  9%]
2025-12-04T14:00:07.8367970Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex64 PASSED [0.6225s] [  9%]
2025-12-04T14:00:07.8368784Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float32 PASSED [0.1657s] [  9%]
2025-12-04T14:00:07.8369602Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float64 PASSED [0.0074s] [  9%]
2025-12-04T14:00:07.8370507Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int16 PASSED [0.1653s] [  9%]
2025-12-04T14:00:07.8371265Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int32 PASSED [0.0070s] [  9%]
2025-12-04T14:00:07.8372035Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int64 PASSED [0.1648s] [  9%]
2025-12-04T14:00:07.8372789Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int8 PASSED [0.0070s] [  9%]
2025-12-04T14:00:07.8373545Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_uint8 PASSED [0.1651s] [  9%]
2025-12-04T14:00:07.8374320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex128 PASSED [0.0102s] [  9%]
2025-12-04T14:00:07.8375102Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex64 PASSED [0.1660s] [  9%]
2025-12-04T14:00:07.8375881Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float32 PASSED [0.0073s] [  9%]
2025-12-04T14:00:07.8376651Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float64 PASSED [0.1653s] [  9%]
2025-12-04T14:00:07.8377408Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int16 PASSED [0.0070s] [  9%]
2025-12-04T14:00:07.8378157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int32 PASSED [0.1651s] [  9%]
2025-12-04T14:00:07.8378957Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int64 PASSED [0.0070s] [  9%]
2025-12-04T14:00:07.8379798Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int8 PASSED [0.1652s] [  9%]
2025-12-04T14:00:07.8380541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_uint8 PASSED [0.0069s] [  9%]
2025-12-04T14:00:07.8381310Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex128 PASSED [0.1689s] [  9%]
2025-12-04T14:00:07.8382167Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex64 PASSED [0.0074s] [  9%]
2025-12-04T14:00:07.8382987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float32 PASSED [0.1655s] [  9%]
2025-12-04T14:00:07.8383767Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float64 PASSED [0.0074s] [  9%]
2025-12-04T14:00:07.8384529Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int16 PASSED [0.1656s] [  9%]
2025-12-04T14:00:07.8385288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int32 PASSED [0.0070s] [  9%]
2025-12-04T14:00:07.8386055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int64 PASSED [0.1653s] [  9%]
2025-12-04T14:00:07.8386815Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int8 PASSED [0.0071s] [  9%]
2025-12-04T14:00:07.8387564Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_uint8 PASSED [0.1653s] [  9%]
2025-12-04T14:00:07.8388339Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float32 PASSED [0.0074s] [  9%]
2025-12-04T14:00:07.8389183Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float64 PASSED [0.1657s] [  9%]
2025-12-04T14:00:07.8389963Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int16 PASSED [0.0070s] [ 10%]
2025-12-04T14:00:07.8390778Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int32 PASSED [0.1651s] [ 10%]
2025-12-04T14:00:07.8391555Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int64 PASSED [0.0071s] [ 10%]
2025-12-04T14:00:07.8392317Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int8 PASSED [0.1655s] [ 10%]
2025-12-04T14:00:07.8393119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_uint8 PASSED [0.0070s] [ 10%]
2025-12-04T14:00:07.8393879Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex128 PASSED [0.1664s] [ 10%]
2025-12-04T14:00:07.8394629Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex64 PASSED [0.0074s] [ 10%]
2025-12-04T14:00:07.8395366Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float32 PASSED [0.1657s] [ 10%]
2025-12-04T14:00:07.8396098Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float64 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8396828Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int16 PASSED [0.1655s] [ 10%]
2025-12-04T14:00:07.8397550Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int32 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8398260Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int64 PASSED [0.1678s] [ 10%]
2025-12-04T14:00:07.8398980Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int8 PASSED [0.0071s] [ 10%]
2025-12-04T14:00:07.8399692Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_uint8 PASSED [0.1660s] [ 10%]
2025-12-04T14:00:07.8400429Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex128 PASSED [0.0075s] [ 10%]
2025-12-04T14:00:07.8401208Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex64 PASSED [0.1655s] [ 10%]
2025-12-04T14:00:07.8401961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float32 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8402700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float64 PASSED [0.1658s] [ 10%]
2025-12-04T14:00:07.8403434Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int16 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8404155Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int32 PASSED [0.1652s] [ 10%]
2025-12-04T14:00:07.8404877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int64 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8405601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int8 PASSED [0.1662s] [ 10%]
2025-12-04T14:00:07.8406319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_uint8 PASSED [0.0095s] [ 10%]
2025-12-04T14:00:07.8407142Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex128 PASSED [0.1662s] [ 10%]
2025-12-04T14:00:07.8408182Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex64 PASSED [0.0074s] [ 10%]
2025-12-04T14:00:07.8409119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float32 PASSED [0.1658s] [ 10%]
2025-12-04T14:00:07.8409874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float64 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8410655Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int16 PASSED [0.1660s] [ 10%]
2025-12-04T14:00:07.8411506Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int32 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8412350Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int64 PASSED [0.1659s] [ 10%]
2025-12-04T14:00:07.8413096Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int8 PASSED [0.0073s] [ 10%]
2025-12-04T14:00:07.8413848Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_uint8 PASSED [0.1661s] [ 11%]
2025-12-04T14:00:07.8414621Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex128 PASSED [0.0074s] [ 11%]
2025-12-04T14:00:07.8415409Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex64 PASSED [0.1660s] [ 11%]
2025-12-04T14:00:07.8416175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float32 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8417027Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float64 PASSED [0.1664s] [ 11%]
2025-12-04T14:00:07.8417785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int16 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8418588Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int32 PASSED [0.1658s] [ 11%]
2025-12-04T14:00:07.8419460Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int64 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8420200Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int8 PASSED [0.1673s] [ 11%]
2025-12-04T14:00:07.8420947Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_uint8 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8421715Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex128 PASSED [0.1662s] [ 11%]
2025-12-04T14:00:07.8422505Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex64 PASSED [0.0074s] [ 11%]
2025-12-04T14:00:07.8423285Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float32 PASSED [0.1660s] [ 11%]
2025-12-04T14:00:07.8424051Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float64 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8424812Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int16 PASSED [0.1660s] [ 11%]
2025-12-04T14:00:07.8425565Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int32 PASSED [0.0072s] [ 11%]
2025-12-04T14:00:07.8426319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int64 PASSED [0.1660s] [ 11%]
2025-12-04T14:00:07.8427068Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int8 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8427816Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_uint8 PASSED [0.0064s] [ 11%]
2025-12-04T14:00:07.8428574Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float32 PASSED [0.1664s] [ 11%]
2025-12-04T14:00:07.8429337Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float64 PASSED [0.0073s] [ 11%]
2025-12-04T14:00:07.8430195Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int16 PASSED [0.1663s] [ 11%]
2025-12-04T14:00:07.8430941Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int32 PASSED [0.0071s] [ 11%]
2025-12-04T14:00:07.8431688Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int64 PASSED [0.1661s] [ 11%]
2025-12-04T14:00:07.8432439Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int8 PASSED [0.0071s] [ 11%]
2025-12-04T14:00:07.8433256Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_uint8 PASSED [0.1662s] [ 11%]
2025-12-04T14:00:07.8434154Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex128 SKIPPED [0.0028s] (Skipped! Out not supported) [ 11%]
2025-12-04T14:00:07.8435250Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 11%]
2025-12-04T14:00:07.8436275Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float32 SKIPPED [0.0026s] (Skipped! Out not supported) [ 11%]
2025-12-04T14:00:07.8437283Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 11%]
2025-12-04T14:00:07.8438286Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 11%]
2025-12-04T14:00:07.8439324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int32 SKIPPED [0.0026s] (Skipped! Out not supported) [ 12%]
2025-12-04T14:00:07.8440315Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 12%]
2025-12-04T14:00:07.8441301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 12%]
2025-12-04T14:00:07.8442288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_uint8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 12%]
2025-12-04T14:00:07.8443258Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex128 PASSED [0.1673s] [ 12%]
2025-12-04T14:00:07.8444115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex64 PASSED [0.0075s] [ 12%]
2025-12-04T14:00:07.8444994Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float32 PASSED [0.1672s] [ 12%]
2025-12-04T14:00:07.8445823Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float64 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8446641Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int16 PASSED [0.1662s] [ 12%]
2025-12-04T14:00:07.8447459Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int32 PASSED [0.0070s] [ 12%]
2025-12-04T14:00:07.8448266Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int64 PASSED [0.1660s] [ 12%]
2025-12-04T14:00:07.8449074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int8 PASSED [0.0071s] [ 12%]
2025-12-04T14:00:07.8449884Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_uint8 PASSED [0.1658s] [ 12%]
2025-12-04T14:00:07.8450682Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float32 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8451467Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float64 PASSED [0.1664s] [ 12%]
2025-12-04T14:00:07.8452241Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int16 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8453014Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int32 PASSED [0.1667s] [ 12%]
2025-12-04T14:00:07.8453776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int64 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8454540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int8 PASSED [0.1662s] [ 12%]
2025-12-04T14:00:07.8455312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_uint8 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8456083Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float32 PASSED [0.1668s] [ 12%]
2025-12-04T14:00:07.8456842Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float64 PASSED [0.0074s] [ 12%]
2025-12-04T14:00:07.8457607Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int16 PASSED [0.1667s] [ 12%]
2025-12-04T14:00:07.8458357Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int32 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8459168Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int64 PASSED [0.1663s] [ 12%]
2025-12-04T14:00:07.8459963Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int8 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8460716Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_uint8 PASSED [0.1665s] [ 12%]
2025-12-04T14:00:07.8461535Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float32 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8462324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float64 PASSED [0.1663s] [ 12%]
2025-12-04T14:00:07.8463111Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int16 PASSED [0.0073s] [ 12%]
2025-12-04T14:00:07.8463895Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int32 PASSED [0.1647s] [ 12%]
2025-12-04T14:00:07.8464670Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int64 PASSED [0.0074s] [ 13%]
2025-12-04T14:00:07.8465435Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int8 PASSED [0.1671s] [ 13%]
2025-12-04T14:00:07.8466215Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_uint8 PASSED [0.0070s] [ 13%]
2025-12-04T14:00:07.8467013Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex128 PASSED [0.1669s] [ 13%]
2025-12-04T14:00:07.8467824Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex64 PASSED [0.0074s] [ 13%]
2025-12-04T14:00:07.8468645Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float32 PASSED [0.1662s] [ 13%]
2025-12-04T14:00:07.8469434Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float64 PASSED [0.0073s] [ 13%]
2025-12-04T14:00:07.8470209Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int16 PASSED [0.1664s] [ 13%]
2025-12-04T14:00:07.8471018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int32 PASSED [0.0073s] [ 13%]
2025-12-04T14:00:07.8471777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int64 PASSED [0.1661s] [ 13%]
2025-12-04T14:00:07.8472551Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int8 PASSED [0.0073s] [ 13%]
2025-12-04T14:00:07.8473329Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_uint8 PASSED [0.1667s] [ 13%]
2025-12-04T14:00:07.8474094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float32 PASSED [0.0073s] [ 13%]
2025-12-04T14:00:07.8474878Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float64 PASSED [0.1664s] [ 13%]
2025-12-04T14:00:07.8475660Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int16 PASSED [0.0071s] [ 13%]
2025-12-04T14:00:07.8476420Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int32 PASSED [0.1663s] [ 13%]
2025-12-04T14:00:07.8477175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int64 PASSED [0.0071s] [ 13%]
2025-12-04T14:00:07.8477934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int8 PASSED [0.1663s] [ 13%]
2025-12-04T14:00:07.8478699Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_uint8 PASSED [0.0071s] [ 13%]
2025-12-04T14:00:07.8479469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float32 PASSED [0.1663s] [ 13%]
2025-12-04T14:00:07.8480239Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float64 PASSED [0.0073s] [ 13%]
2025-12-04T14:00:07.8481159Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8482230Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex64 SKIPPED [0.0028s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8483278Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8484304Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8485324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int16 SKIPPED [0.0027s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8486384Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8487438Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8488439Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8489499Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8490541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex128 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%]
2025-12-04T14:00:07.8491602Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8492640Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8493678Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8494700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int16 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8495756Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8496760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8497839Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8498891Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%]
2025-12-04T14:00:07.8499874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float32 PASSED [0.1814s] [ 14%]
2025-12-04T14:00:07.8500685Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float64 PASSED [0.0072s] [ 14%]
2025-12-04T14:00:07.8501488Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int16 PASSED [0.1669s] [ 14%]
2025-12-04T14:00:07.8502271Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int32 PASSED [0.0071s] [ 14%]
2025-12-04T14:00:07.8503065Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int64 PASSED [0.1666s] [ 14%]
2025-12-04T14:00:07.8503847Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int8 PASSED [0.0071s] [ 14%]
2025-12-04T14:00:07.8504647Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_uint8 PASSED [0.1668s] [ 14%]
2025-12-04T14:00:07.8505452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float32 PASSED [0.0072s] [ 14%]
2025-12-04T14:00:07.8506246Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float64 PASSED [0.1665s] [ 14%]
2025-12-04T14:00:07.8507033Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int16 PASSED [0.0071s] [ 14%]
2025-12-04T14:00:07.8507973Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int32 PASSED [0.1668s] [ 14%]
2025-12-04T14:00:07.8508785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int64 PASSED [0.0071s] [ 14%]
2025-12-04T14:00:07.8509588Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int8 PASSED [0.1671s] [ 14%]
2025-12-04T14:00:07.8510367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_uint8 PASSED [0.0071s] [ 14%]
2025-12-04T14:00:07.8511163Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex128 PASSED [0.1671s] [ 14%]
2025-12-04T14:00:07.8511965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex64 PASSED [0.0074s] [ 14%]
2025-12-04T14:00:07.8512751Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float32 PASSED [0.1669s] [ 14%]
2025-12-04T14:00:07.8513599Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float64 PASSED [0.0073s] [ 14%]
2025-12-04T14:00:07.8514426Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int16 PASSED [0.1670s] [ 14%]
2025-12-04T14:00:07.8515187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int32 PASSED [0.0073s] [ 14%]
2025-12-04T14:00:07.8515944Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int64 PASSED [0.1670s] [ 14%]
2025-12-04T14:00:07.8516701Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int8 PASSED [0.0073s] [ 14%]
2025-12-04T14:00:07.8517461Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_uint8 PASSED [0.1669s] [ 14%]
2025-12-04T14:00:07.8518244Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float32 PASSED [0.0073s] [ 15%]
2025-12-04T14:00:07.8519054Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float64 PASSED [0.1674s] [ 15%]
2025-12-04T14:00:07.8519867Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int16 PASSED [0.0071s] [ 15%]
2025-12-04T14:00:07.8520660Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int32 PASSED [0.1671s] [ 15%]
2025-12-04T14:00:07.8521451Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int64 PASSED [0.0071s] [ 15%]
2025-12-04T14:00:07.8522299Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int8 PASSED [0.1669s] [ 15%]
2025-12-04T14:00:07.8523097Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_uint8 PASSED [0.0072s] [ 15%]
2025-12-04T14:00:07.8523892Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex128 PASSED [0.1676s] [ 15%]
2025-12-04T14:00:07.8524747Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex64 PASSED [0.0075s] [ 15%]
2025-12-04T14:00:07.8525512Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float32 PASSED [0.1673s] [ 15%]
2025-12-04T14:00:07.8526277Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float64 PASSED [0.0073s] [ 15%]
2025-12-04T14:00:07.8527029Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int16 PASSED [0.1677s] [ 15%]
2025-12-04T14:00:07.8527773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int32 PASSED [0.0072s] [ 15%]
2025-12-04T14:00:07.8528512Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int64 PASSED [0.1672s] [ 15%]
2025-12-04T14:00:07.8529257Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int8 PASSED [0.0072s] [ 15%]
2025-12-04T14:00:07.8529989Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_uint8 PASSED [0.1679s] [ 15%]
2025-12-04T14:00:07.8530937Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8532074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8533203Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int16 SKIPPED [0.0025s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8534311Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int32 SKIPPED [0.0027s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8535427Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8536540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8537656Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8538784Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8540010Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8541129Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8542194Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8543250Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8544283Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8545324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8546362Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%]
2025-12-04T14:00:07.8547402Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 16%]
2025-12-04T14:00:07.8548316Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float32 PASSED [0.1678s] [ 16%]
2025-12-04T14:00:07.8549113Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float64 PASSED [0.0074s] [ 16%]
2025-12-04T14:00:07.8549941Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int16 PASSED [0.1677s] [ 16%]
2025-12-04T14:00:07.8550719Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int32 PASSED [0.0073s] [ 16%]
2025-12-04T14:00:07.8551527Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int64 PASSED [0.1674s] [ 16%]
2025-12-04T14:00:07.8552294Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int8 PASSED [0.0073s] [ 16%]
2025-12-04T14:00:07.8553061Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_uint8 PASSED [0.1675s] [ 16%]
2025-12-04T14:00:07.8553833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float32 PASSED [0.0073s] [ 16%]
2025-12-04T14:00:07.8554607Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float64 PASSED [0.1675s] [ 16%]
2025-12-04T14:00:07.8555367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int16 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8556125Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int32 PASSED [0.1673s] [ 16%]
2025-12-04T14:00:07.8556872Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int64 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8557631Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int8 PASSED [0.1670s] [ 16%]
2025-12-04T14:00:07.8558387Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_uint8 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8559159Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex128 PASSED [0.1676s] [ 16%]
2025-12-04T14:00:07.8559934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex64 PASSED [0.0074s] [ 16%]
2025-12-04T14:00:07.8560696Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float32 PASSED [0.1676s] [ 16%]
2025-12-04T14:00:07.8561452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float64 PASSED [0.0073s] [ 16%]
2025-12-04T14:00:07.8562203Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int16 PASSED [0.1673s] [ 16%]
2025-12-04T14:00:07.8562950Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int32 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8563690Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int64 PASSED [0.1675s] [ 16%]
2025-12-04T14:00:07.8564435Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int8 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8565172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_uint8 PASSED [0.1671s] [ 16%]
2025-12-04T14:00:07.8565933Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float32 PASSED [0.0074s] [ 16%]
2025-12-04T14:00:07.8566762Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float64 PASSED [0.1675s] [ 16%]
2025-12-04T14:00:07.8567563Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int16 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8568311Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int32 PASSED [0.1676s] [ 16%]
2025-12-04T14:00:07.8569058Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int64 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8569804Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int8 PASSED [0.1675s] [ 16%]
2025-12-04T14:00:07.8570552Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_uint8 PASSED [0.0071s] [ 16%]
2025-12-04T14:00:07.8571316Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float32 PASSED [0.1675s] [ 17%]
2025-12-04T14:00:07.8572108Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float64 PASSED [0.0072s] [ 17%]
2025-12-04T14:00:07.8572893Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int16 PASSED [0.1674s] [ 17%]
2025-12-04T14:00:07.8573666Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int32 PASSED [0.0071s] [ 17%]
2025-12-04T14:00:07.8574441Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int64 PASSED [0.1672s] [ 17%]
2025-12-04T14:00:07.8575258Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int8 PASSED [0.0071s] [ 17%]
2025-12-04T14:00:07.8576032Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_uint8 PASSED [0.1676s] [ 17%]
2025-12-04T14:00:07.8576805Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex128 PASSED [0.0074s] [ 17%]
2025-12-04T14:00:07.8577626Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex64 PASSED [0.1679s] [ 17%]
2025-12-04T14:00:07.8578393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float32 PASSED [0.0073s] [ 17%]
2025-12-04T14:00:07.8579271Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float64 PASSED [0.1681s] [ 17%]
2025-12-04T14:00:07.8580021Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int16 PASSED [0.0074s] [ 17%]
2025-12-04T14:00:07.8580771Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int32 PASSED [0.1674s] [ 17%]
2025-12-04T14:00:07.8581520Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int64 PASSED [0.0073s] [ 17%]
2025-12-04T14:00:07.8582254Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int8 PASSED [0.1678s] [ 17%]
2025-12-04T14:00:07.8582998Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_uint8 PASSED [0.0073s] [ 17%]
2025-12-04T14:00:07.8583774Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex128 PASSED [0.1677s] [ 17%]
2025-12-04T14:00:07.8584560Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex64 PASSED [0.0074s] [ 17%]
2025-12-04T14:00:07.8585327Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float32 PASSED [0.1678s] [ 17%]
2025-12-04T14:00:07.8586095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float64 PASSED [0.0073s] [ 17%]
2025-12-04T14:00:07.8586852Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int16 PASSED [0.1678s] [ 17%]
2025-12-04T14:00:07.8587601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int32 PASSED [0.0073s] [ 17%]
2025-12-04T14:00:07.8588343Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int64 PASSED [0.1678s] [ 17%]
2025-12-04T14:00:07.8589094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int8 PASSED [0.0073s] [ 17%]
2025-12-04T14:00:07.8589857Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_uint8 PASSED [0.1678s] [ 17%]
2025-12-04T14:00:07.8590636Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex128 PASSED [0.0074s] [ 17%]
2025-12-04T14:00:07.8591424Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex64 PASSED [0.1676s] [ 17%]
2025-12-04T14:00:07.8592253Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float32 PASSED [0.0074s] [ 17%]
2025-12-04T14:00:07.8593023Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float64 PASSED [0.1679s] [ 17%]
2025-12-04T14:00:07.8593820Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int16 PASSED [0.0074s] [ 17%]
2025-12-04T14:00:07.8594584Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int32 PASSED [0.1678s] [ 17%]
2025-12-04T14:00:07.8595329Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int64 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8596082Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int8 PASSED [0.1681s] [ 18%]
2025-12-04T14:00:07.8596833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_uint8 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8597597Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex128 PASSED [0.1682s] [ 18%]
2025-12-04T14:00:07.8598386Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex64 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8599204Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float32 PASSED [0.1682s] [ 18%]
2025-12-04T14:00:07.8599965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float64 PASSED [0.0074s] [ 18%]
2025-12-04T14:00:07.8600721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int16 PASSED [0.1675s] [ 18%]
2025-12-04T14:00:07.8601533Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int32 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8602271Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int64 PASSED [0.1679s] [ 18%]
2025-12-04T14:00:07.8603049Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int8 PASSED [0.0074s] [ 18%]
2025-12-04T14:00:07.8603787Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_uint8 PASSED [0.1681s] [ 18%]
2025-12-04T14:00:07.8604566Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex128 PASSED [0.0074s] [ 18%]
2025-12-04T14:00:07.8605354Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex64 PASSED [0.1681s] [ 18%]
2025-12-04T14:00:07.8606139Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float32 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8606905Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float64 PASSED [0.1681s] [ 18%]
2025-12-04T14:00:07.8607664Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int16 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8608669Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int32 PASSED [0.1687s] [ 18%]
2025-12-04T14:00:07.8609397Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int64 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8610127Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int8 PASSED [0.1677s] [ 18%]
2025-12-04T14:00:07.8610856Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_uint8 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8611592Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float32 PASSED [0.1681s] [ 18%]
2025-12-04T14:00:07.8612345Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float64 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8613093Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int16 PASSED [0.1680s] [ 18%]
2025-12-04T14:00:07.8613838Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int32 PASSED [0.0071s] [ 18%]
2025-12-04T14:00:07.8614580Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int64 PASSED [0.1679s] [ 18%]
2025-12-04T14:00:07.8615319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int8 PASSED [0.0071s] [ 18%]
2025-12-04T14:00:07.8616056Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_uint8 PASSED [0.1680s] [ 18%]
2025-12-04T14:00:07.8616874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex128 PASSED [0.0084s] [ 18%]
2025-12-04T14:00:07.8617783Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex64 PASSED [0.1684s] [ 18%]
2025-12-04T14:00:07.8618785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float32 PASSED [0.0073s] [ 18%]
2025-12-04T14:00:07.8619807Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float64 PASSED [0.1683s] [ 19%]
2025-12-04T14:00:07.8620680Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int16 PASSED [0.0071s] [ 19%]
2025-12-04T14:00:07.8621545Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int32 PASSED [0.1680s] [ 19%]
2025-12-04T14:00:07.8622402Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int64 PASSED [0.0071s] [ 19%]
2025-12-04T14:00:07.8623252Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int8 PASSED [0.1681s] [ 19%]
2025-12-04T14:00:07.8624095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_uint8 PASSED [0.0071s] [ 19%]
2025-12-04T14:00:07.8624985Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex128 PASSED [0.1689s] [ 19%]
2025-12-04T14:00:07.8625895Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex64 PASSED [0.0074s] [ 19%]
2025-12-04T14:00:07.8626792Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float32 PASSED [0.1683s] [ 19%]
2025-12-04T14:00:07.8627726Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float64 PASSED [0.0073s] [ 19%]
2025-12-04T14:00:07.8628639Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int16 PASSED [0.1684s] [ 19%]
2025-12-04T14:00:07.8630356Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int32 PASSED [0.0073s] [ 19%]
2025-12-04T14:00:07.8631216Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int64 PASSED [0.1683s] [ 19%]
2025-12-04T14:00:07.8632088Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int8 PASSED [0.0073s] [ 19%]
2025-12-04T14:00:07.8632969Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_uint8 PASSED [0.1682s] [ 19%]
2025-12-04T14:00:07.8633875Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex128 PASSED [0.0073s] [ 19%]
2025-12-04T14:00:07.8634802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex64 PASSED [0.1682s] [ 19%]
2025-12-04T14:00:07.8635712Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float32 PASSED [0.0072s] [ 19%]
2025-12-04T14:00:07.8636613Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float64 PASSED [0.1683s] [ 19%]
2025-12-04T14:00:07.8637510Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int16 PASSED [0.0073s] [ 19%]
2025-12-04T14:00:07.8638380Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int32 PASSED [0.1684s] [ 19%]
2025-12-04T14:00:07.8639318Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int64 PASSED [0.0072s] [ 19%]
2025-12-04T14:00:07.8640196Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int8 PASSED [0.1681s] [ 19%]
2025-12-04T14:00:07.8641080Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_uint8 PASSED [0.0072s] [ 19%]
2025-12-04T14:00:07.8641966Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex128 PASSED [0.1684s] [ 19%]
2025-12-04T14:00:07.8642881Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex64 PASSED [0.0073s] [ 19%]
2025-12-04T14:00:07.8643791Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float32 PASSED [0.1682s] [ 19%]
2025-12-04T14:00:07.8644679Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float64 PASSED [0.0072s] [ 19%]
2025-12-04T14:00:07.8645608Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int16 PASSED [0.1686s] [ 19%]
2025-12-04T14:00:07.8646482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int32 PASSED [0.0072s] [ 19%]
2025-12-04T14:00:07.8647398Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int64 PASSED [0.1684s] [ 19%]
2025-12-04T14:00:07.8648257Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int8 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8649103Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_uint8 PASSED [0.1685s] [ 20%]
2025-12-04T14:00:07.8650008Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex128 PASSED [0.0073s] [ 20%]
2025-12-04T14:00:07.8650940Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex64 PASSED [0.1684s] [ 20%]
2025-12-04T14:00:07.8651849Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float32 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8652744Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float64 PASSED [0.1684s] [ 20%]
2025-12-04T14:00:07.8653629Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int16 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8654563Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int32 PASSED [0.1687s] [ 20%]
2025-12-04T14:00:07.8655436Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int64 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8656295Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int8 PASSED [0.1687s] [ 20%]
2025-12-04T14:00:07.8666401Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_uint8 PASSED [0.0070s] [ 20%]
2025-12-04T14:00:07.8667295Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float32 PASSED [0.1685s] [ 20%]
2025-12-04T14:00:07.8668179Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float64 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8669104Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int16 PASSED [0.1686s] [ 20%]
2025-12-04T14:00:07.8669961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int32 PASSED [0.0070s] [ 20%]
2025-12-04T14:00:07.8670817Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int64 PASSED [0.1683s] [ 20%]
2025-12-04T14:00:07.8671666Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int8 PASSED [0.0070s] [ 20%]
2025-12-04T14:00:07.8672513Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_uint8 PASSED [0.1685s] [ 20%]
2025-12-04T14:00:07.8673393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex128 PASSED [0.0074s] [ 20%]
2025-12-04T14:00:07.8674293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex64 PASSED [0.1688s] [ 20%]
2025-12-04T14:00:07.8675178Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float32 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8676050Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float64 PASSED [0.1684s] [ 20%]
2025-12-04T14:00:07.8676916Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int16 PASSED [0.0069s] [ 20%]
2025-12-04T14:00:07.8677776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int32 PASSED [0.1685s] [ 20%]
2025-12-04T14:00:07.8678622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int64 PASSED [0.0069s] [ 20%]
2025-12-04T14:00:07.8679475Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int8 PASSED [0.1678s] [ 20%]
2025-12-04T14:00:07.8680320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_uint8 PASSED [0.0070s] [ 20%]
2025-12-04T14:00:07.8681326Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex128 PASSED [0.1689s] [ 20%]
2025-12-04T14:00:07.8682351Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex64 PASSED [0.0073s] [ 20%]
2025-12-04T14:00:07.8683322Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float32 PASSED [0.1687s] [ 20%]
2025-12-04T14:00:07.8684279Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float64 PASSED [0.0072s] [ 20%]
2025-12-04T14:00:07.8685226Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int16 PASSED [0.1686s] [ 21%]
2025-12-04T14:00:07.8686161Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int32 PASSED [0.0070s] [ 21%]
2025-12-04T14:00:07.8687094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int64 PASSED [0.1686s] [ 21%]
2025-12-04T14:00:07.8688029Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int8 PASSED [0.0070s] [ 21%]
2025-12-04T14:00:07.8689016Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_uint8 PASSED [0.1686s] [ 21%]
2025-12-04T14:00:07.8689931Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float32 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8690879Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float64 PASSED [0.1685s] [ 21%]
2025-12-04T14:00:07.8691778Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int16 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8692703Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int32 PASSED [0.1687s] [ 21%]
2025-12-04T14:00:07.8693579Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int64 PASSED [0.0073s] [ 21%]
2025-12-04T14:00:07.8694456Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int8 PASSED [0.1690s] [ 21%]
2025-12-04T14:00:07.8695331Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_uint8 PASSED [0.0073s] [ 21%]
2025-12-04T14:00:07.8696213Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float32 PASSED [0.1688s] [ 21%]
2025-12-04T14:00:07.8697076Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float64 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8697928Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int16 PASSED [0.1682s] [ 21%]
2025-12-04T14:00:07.8698780Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int32 PASSED [0.0073s] [ 21%]
2025-12-04T14:00:07.8699705Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int64 PASSED [0.1690s] [ 21%]
2025-12-04T14:00:07.8700539Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int8 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8701380Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_uint8 PASSED [0.1688s] [ 21%]
2025-12-04T14:00:07.8702243Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float32 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8703128Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float64 PASSED [0.1690s] [ 21%]
2025-12-04T14:00:07.8704009Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int16 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8704878Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int32 PASSED [0.1691s] [ 21%]
2025-12-04T14:00:07.8705754Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int64 PASSED [0.0072s] [ 21%]
2025-12-04T14:00:07.8706623Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int8 PASSED [0.1688s] [ 21%]
2025-12-04T14:00:07.8707569Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_uint8 PASSED [0.0070s] [ 21%]
2025-12-04T14:00:07.8708700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex128 PASSED [0.1692s] [ 21%]
2025-12-04T14:00:07.8709696Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex64 PASSED [0.0073s] [ 21%]
2025-12-04T14:00:07.8710586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float32 PASSED [0.1686s] [ 21%]
2025-12-04T14:00:07.8711470Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float64 PASSED [0.0073s] [ 21%]
2025-12-04T14:00:07.8712384Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int16 PASSED [0.1689s] [ 21%]
2025-12-04T14:00:07.8713471Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int32 PASSED [0.0072s] [ 22%]
2025-12-04T14:00:07.8714334Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int64 PASSED [0.1681s] [ 22%]
2025-12-04T14:00:07.8715256Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int8 PASSED [0.0072s] [ 22%]
2025-12-04T14:00:07.8716173Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_uint8 PASSED [0.1682s] [ 22%]
2025-12-04T14:00:07.8717196Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float32 PASSED [0.0072s] [ 22%]
2025-12-04T14:00:07.8718079Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float64 PASSED [0.1682s] [ 22%]
2025-12-04T14:00:07.8718948Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int16 PASSED [0.0070s] [ 22%]
2025-12-04T14:00:07.8719869Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int32 PASSED [0.1681s] [ 22%]
2025-12-04T14:00:07.8720731Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int64 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8721585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int8 PASSED [0.1674s] [ 22%]
2025-12-04T14:00:07.8722442Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_uint8 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8723308Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float32 PASSED [0.1682s] [ 22%]
2025-12-04T14:00:07.8724177Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float64 PASSED [0.0072s] [ 22%]
2025-12-04T14:00:07.8725068Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex128 PASSED [0.1684s] [ 22%]
2025-12-04T14:00:07.8725977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex64 PASSED [0.0072s] [ 22%]
2025-12-04T14:00:07.8726871Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float32 PASSED [0.1679s] [ 22%]
2025-12-04T14:00:07.8727760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float64 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8728634Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int16 PASSED [0.1690s] [ 22%]
2025-12-04T14:00:07.8729498Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int32 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8730361Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int64 PASSED [0.1686s] [ 22%]
2025-12-04T14:00:07.8731214Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int8 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8732074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_uint8 PASSED [0.1686s] [ 22%]
2025-12-04T14:00:07.8732961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex128 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8733866Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex64 PASSED [0.1693s] [ 22%]
2025-12-04T14:00:07.8734820Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float32 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8735750Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float64 PASSED [0.1691s] [ 22%]
2025-12-04T14:00:07.8736622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int16 PASSED [0.0071s] [ 22%]
2025-12-04T14:00:07.8737487Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int32 PASSED [0.1682s] [ 22%]
2025-12-04T14:00:07.8738342Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int64 PASSED [0.0070s] [ 22%]
2025-12-04T14:00:07.8739262Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int8 PASSED [0.1691s] [ 22%]
2025-12-04T14:00:07.8740126Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_uint8 PASSED [0.0070s] [ 22%]
2025-12-04T14:00:07.8741017Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float32 PASSED [0.1688s] [ 23%]
2025-12-04T14:00:07.8741923Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float64 PASSED [0.0072s] [ 23%]
2025-12-04T14:00:07.8742823Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int16 PASSED [0.1691s] [ 23%]
2025-12-04T14:00:07.8743759Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int32 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8744650Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int64 PASSED [0.1693s] [ 23%]
2025-12-04T14:00:07.8745572Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int8 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8746455Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_uint8 PASSED [0.1694s] [ 23%]
2025-12-04T14:00:07.8747353Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float32 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8748263Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float64 PASSED [0.1693s] [ 23%]
2025-12-04T14:00:07.8749165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int16 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8750057Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int32 PASSED [0.1695s] [ 23%]
2025-12-04T14:00:07.8750948Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int64 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8751833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int8 PASSED [0.1687s] [ 23%]
2025-12-04T14:00:07.8752721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_uint8 PASSED [0.0072s] [ 23%]
2025-12-04T14:00:07.8753617Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex128 PASSED [0.1690s] [ 23%]
2025-12-04T14:00:07.8754526Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex64 PASSED [0.0073s] [ 23%]
2025-12-04T14:00:07.8755417Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float32 PASSED [0.1696s] [ 23%]
2025-12-04T14:00:07.8756292Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float64 PASSED [0.0072s] [ 23%]
2025-12-04T14:00:07.8757161Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int16 PASSED [0.1692s] [ 23%]
2025-12-04T14:00:07.8758027Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int32 PASSED [0.0072s] [ 23%]
2025-12-04T14:00:07.8758942Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int64 PASSED [0.1696s] [ 23%]
2025-12-04T14:00:07.8759793Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int8 PASSED [0.0072s] [ 23%]
2025-12-04T14:00:07.8760704Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_uint8 PASSED [0.1693s] [ 23%]
2025-12-04T14:00:07.8761600Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float32 PASSED [0.0072s] [ 23%]
2025-12-04T14:00:07.8762566Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float64 PASSED [0.1695s] [ 23%]
2025-12-04T14:00:07.8763481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int16 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8764388Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int32 PASSED [0.1695s] [ 23%]
2025-12-04T14:00:07.8765296Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int64 PASSED [0.0071s] [ 23%]
2025-12-04T14:00:07.8766190Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int8 PASSED [0.1697s] [ 23%]
2025-12-04T14:00:07.8767092Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_uint8 PASSED [0.0070s] [ 23%]
2025-12-04T14:00:07.8767991Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex128 PASSED [0.1695s] [ 23%]
2025-12-04T14:00:07.8768885Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex64 PASSED [0.0073s] [ 24%]
2025-12-04T14:00:07.8769798Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float32 PASSED [0.1698s] [ 24%]
2025-12-04T14:00:07.8770666Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float64 PASSED [0.0073s] [ 24%]
2025-12-04T14:00:07.8771516Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int16 PASSED [0.1694s] [ 24%]
2025-12-04T14:00:07.8772404Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int32 PASSED [0.0071s] [ 24%]
2025-12-04T14:00:07.8773240Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int64 PASSED [0.1697s] [ 24%]
2025-12-04T14:00:07.8774083Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int8 PASSED [0.0071s] [ 24%]
2025-12-04T14:00:07.8774925Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_uint8 PASSED [0.1695s] [ 24%]
2025-12-04T14:00:07.8775854Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float32 PASSED [0.0072s] [ 24%]
2025-12-04T14:00:07.8776851Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float64 PASSED [0.1697s] [ 24%]
2025-12-04T14:00:07.8777841Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int16 PASSED [0.0070s] [ 24%]
2025-12-04T14:00:07.8778870Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int32 PASSED [0.1695s] [ 24%]
2025-12-04T14:00:07.8779902Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int64 PASSED [0.0069s] [ 24%]
2025-12-04T14:00:07.8780874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int8 PASSED [0.1696s] [ 24%]
2025-12-04T14:00:07.8781849Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_uint8 PASSED [0.0069s] [ 24%]
2025-12-04T14:00:07.8782808Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex128 PASSED [0.1689s] [ 24%]
2025-12-04T14:00:07.8783750Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex64 PASSED [0.0073s] [ 24%]
2025-12-04T14:00:07.8784669Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float32 PASSED [0.1698s] [ 24%]
2025-12-04T14:00:07.8785581Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float64 PASSED [0.0072s] [ 24%]
2025-12-04T14:00:07.8786487Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int16 PASSED [0.1696s] [ 24%]
2025-12-04T14:00:07.8787426Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int32 PASSED [0.0070s] [ 24%]
2025-12-04T14:00:07.8788431Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int64 PASSED [0.1694s] [ 24%]
2025-12-04T14:00:07.8789368Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int8 PASSED [0.0069s] [ 24%]
2025-12-04T14:00:07.8790261Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_uint8 PASSED [0.1695s] [ 24%]
2025-12-04T14:00:07.8791157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float32 PASSED [0.0072s] [ 24%]
2025-12-04T14:00:07.8792057Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float64 PASSED [0.1697s] [ 24%]
2025-12-04T14:00:07.8792946Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int16 PASSED [0.0072s] [ 24%]
2025-12-04T14:00:07.8793828Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int32 PASSED [0.1694s] [ 24%]
2025-12-04T14:00:07.8794819Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int64 PASSED [0.0073s] [ 24%]
2025-12-04T14:00:07.8795699Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int8 PASSED [0.1699s] [ 24%]
2025-12-04T14:00:07.8796639Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_uint8 PASSED [0.0072s] [ 24%]
2025-12-04T14:00:07.8797522Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float32 PASSED [0.1700s] [ 25%]
2025-12-04T14:00:07.8798472Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float64 PASSED [0.0073s] [ 25%]
2025-12-04T14:00:07.8799393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int16 PASSED [0.1689s] [ 25%]
2025-12-04T14:00:07.8800259Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int32 PASSED [0.0070s] [ 25%]
2025-12-04T14:00:07.8801123Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int64 PASSED [0.1699s] [ 25%]
2025-12-04T14:00:07.8801986Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int8 PASSED [0.0070s] [ 25%]
2025-12-04T14:00:07.8802840Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_uint8 PASSED [0.1697s] [ 25%]
2025-12-04T14:00:07.8803723Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex128 PASSED [0.0073s] [ 25%]
2025-12-04T14:00:07.8804611Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex64 PASSED [0.1702s] [ 25%]
2025-12-04T14:00:07.8805484Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float32 PASSED [0.0072s] [ 25%]
2025-12-04T14:00:07.8806349Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float64 PASSED [0.1701s] [ 25%]
2025-12-04T14:00:07.8807207Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int16 PASSED [0.0071s] [ 25%]
2025-12-04T14:00:07.8808312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int32 PASSED [0.1697s] [ 25%]
2025-12-04T14:00:07.8809212Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int64 PASSED [0.0070s] [ 25%]
2025-12-04T14:00:07.8810055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int8 PASSED [0.1699s] [ 25%]
2025-12-04T14:00:07.8810896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_uint8 PASSED [0.0070s] [ 25%]
2025-12-04T14:00:07.8811757Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float32 PASSED [0.1698s] [ 25%]
2025-12-04T14:00:07.8812625Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float64 PASSED [0.0073s] [ 25%]
2025-12-04T14:00:07.8813490Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int16 PASSED [0.1697s] [ 25%]
2025-12-04T14:00:07.8814437Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int32 PASSED [0.0070s] [ 25%]
2025-12-04T14:00:07.8815343Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int64 PASSED [0.1700s] [ 25%]
2025-12-04T14:00:07.8816188Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int8 PASSED [0.0070s] [ 25%]
2025-12-04T14:00:07.8817039Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_uint8 PASSED [0.1701s] [ 25%]
2025-12-04T14:00:07.8817914Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float32 PASSED [0.0071s] [ 25%]
2025-12-04T14:00:07.8818821Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float64 PASSED [0.1698s] [ 25%]
2025-12-04T14:00:07.8819799Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int16 PASSED [0.0071s] [ 25%]
2025-12-04T14:00:07.8820681Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int32 PASSED [0.1698s] [ 25%]
2025-12-04T14:00:07.8821562Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int64 PASSED [0.0071s] [ 25%]
2025-12-04T14:00:07.8822439Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int8 PASSED [0.1700s] [ 25%]
2025-12-04T14:00:07.8823374Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_uint8 PASSED [0.0071s] [ 25%]
2025-12-04T14:00:07.8824259Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex128 PASSED [0.1701s] [ 25%]
2025-12-04T14:00:07.8825196Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex64 PASSED [0.0073s] [ 26%]
2025-12-04T14:00:07.8826072Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float32 PASSED [0.1705s] [ 26%]
2025-12-04T14:00:07.8826929Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float64 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8827783Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int16 PASSED [0.1704s] [ 26%]
2025-12-04T14:00:07.8828630Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int32 PASSED [0.0074s] [ 26%]
2025-12-04T14:00:07.8829473Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int64 PASSED [0.1704s] [ 26%]
2025-12-04T14:00:07.8830308Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int8 PASSED [0.0074s] [ 26%]
2025-12-04T14:00:07.8831143Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_uint8 PASSED [0.1701s] [ 26%]
2025-12-04T14:00:07.8832014Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex128 PASSED [0.0073s] [ 26%]
2025-12-04T14:00:07.8832917Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex64 PASSED [0.1701s] [ 26%]
2025-12-04T14:00:07.8833809Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float32 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8834685Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float64 PASSED [0.1703s] [ 26%]
2025-12-04T14:00:07.8835546Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int16 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8836393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int32 PASSED [0.1703s] [ 26%]
2025-12-04T14:00:07.8837242Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int64 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8838092Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int8 PASSED [0.1699s] [ 26%]
2025-12-04T14:00:07.8838939Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_uint8 PASSED [0.0073s] [ 26%]
2025-12-04T14:00:07.8839813Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex128 PASSED [0.1703s] [ 26%]
2025-12-04T14:00:07.8840765Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex64 PASSED [0.0073s] [ 26%]
2025-12-04T14:00:07.8841692Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float32 PASSED [0.1700s] [ 26%]
2025-12-04T14:00:07.8842571Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float64 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8843431Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int16 PASSED [0.1705s] [ 26%]
2025-12-04T14:00:07.8844288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int32 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8845146Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int64 PASSED [0.1705s] [ 26%]
2025-12-04T14:00:07.8846000Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int8 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8846852Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_uint8 PASSED [0.1708s] [ 26%]
2025-12-04T14:00:07.8847729Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex128 PASSED [0.0073s] [ 26%]
2025-12-04T14:00:07.8848623Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex64 PASSED [0.1705s] [ 26%]
2025-12-04T14:00:07.8849540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float32 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8850406Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float64 PASSED [0.1708s] [ 26%]
2025-12-04T14:00:07.8851302Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int16 PASSED [0.0072s] [ 26%]
2025-12-04T14:00:07.8852147Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int32 PASSED [0.1705s] [ 27%]
2025-12-04T14:00:07.8852987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int64 PASSED [0.0072s] [ 27%]
2025-12-04T14:00:07.8853823Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int8 PASSED [0.1703s] [ 27%]
2025-12-04T14:00:07.8854665Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_uint8 PASSED [0.0072s] [ 27%]
2025-12-04T14:00:07.8855538Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex128 PASSED [0.1709s] [ 27%]
2025-12-04T14:00:07.8856434Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex64 PASSED [0.0073s] [ 27%]
2025-12-04T14:00:07.8857312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float32 PASSED [0.1705s] [ 27%]
2025-12-04T14:00:07.8858190Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float64 PASSED [0.0073s] [ 27%]
2025-12-04T14:00:07.8859158Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int16 PASSED [0.1704s] [ 27%]
2025-12-04T14:00:07.8860009Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int32 PASSED [0.0073s] [ 27%]
2025-12-04T14:00:07.8860865Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int64 PASSED [0.1706s] [ 27%]
2025-12-04T14:00:07.8861717Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int8 PASSED [0.0073s] [ 27%]
2025-12-04T14:00:07.8862569Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_uint8 PASSED [0.1708s] [ 27%]
2025-12-04T14:00:07.8863432Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float32 PASSED [0.0073s] [ 27%]
2025-12-04T14:00:07.8864320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float64 PASSED [0.1707s] [ 27%]
2025-12-04T14:00:07.8865189Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int16 PASSED [0.0071s] [ 27%]
2025-12-04T14:00:07.8866059Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int32 PASSED [0.1708s] [ 27%]
2025-12-04T14:00:07.8866972Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int64 PASSED [0.0071s] [ 27%]
2025-12-04T14:00:07.8867870Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int8 PASSED [0.1702s] [ 27%]
2025-12-04T14:00:07.8868730Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_uint8 PASSED [0.0071s] [ 27%]
2025-12-04T14:00:07.8869581Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_complex128 PASSED [0.7683s] [ 27%]
2025-12-04T14:00:07.8870402Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_float64 PASSED [0.5865s] [ 27%]
2025-12-04T14:00:07.8871414Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8872591Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8873776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8874961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8876187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8877364Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8878586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8879774Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%]
2025-12-04T14:00:07.8880773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_ceil_cuda_float64 PASSED [0.4675s] [ 27%]
2025-12-04T14:00:07.8881614Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_complex128 PASSED [0.2537s] [ 28%]
2025-12-04T14:00:07.8882460Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_float64 PASSED [0.0341s] [ 28%]
2025-12-04T14:00:07.8883338Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_complex128 PASSED [1.4354s] [ 28%]
2025-12-04T14:00:07.8884249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_float64 PASSED [0.4534s] [ 28%]
2025-12-04T14:00:07.8885120Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_deg2rad_cuda_float64 PASSED [0.5208s] [ 28%]
2025-12-04T14:00:07.8886120Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erf_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%]
2025-12-04T14:00:07.8887116Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erfinv_cuda_float64 PASSED [0.6833s] [ 28%]
2025-12-04T14:00:07.8888133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%]
2025-12-04T14:00:07.8889321Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%]
2025-12-04T14:00:07.8890325Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_floor_cuda_float64 PASSED [0.4654s] [ 28%]
2025-12-04T14:00:07.8891170Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_frac_cuda_float64 PASSED [0.4871s] [ 28%]
2025-12-04T14:00:07.8892174Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_complex128 SKIPPED [0.0027s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8893392Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_float64 SKIPPED [0.0025s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8894599Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_complex128 SKIPPED [0.0028s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8895769Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_float64 SKIPPED [0.0025s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8896933Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isneginf_cuda_float64 SKIPPED [0.0025s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8898119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isposinf_cuda_float64 SKIPPED [0.0027s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8899187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_complex128 PASSED [1.5048s] [ 28%]
2025-12-04T14:00:07.8900037Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_float64 PASSED [0.5618s] [ 28%]
2025-12-04T14:00:07.8901047Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nan_to_num_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%]
2025-12-04T14:00:07.8902134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_complex128 PASSED [1.4003s] [ 28%]
2025-12-04T14:00:07.8902965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_float64 PASSED [0.5186s] [ 28%]
2025-12-04T14:00:07.8903853Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nn_functional_relu_cuda_float64 PASSED [0.3764s] [ 28%]
2025-12-04T14:00:07.8904811Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_complex128 PASSED [1.1605s] [ 28%]
2025-12-04T14:00:07.8905684Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_float64 PASSED [0.4378s] [ 28%]
2025-12-04T14:00:07.8906534Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_rad2deg_cuda_float64 PASSED [0.4994s] [ 28%]
2025-12-04T14:00:07.8907378Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_round_cuda_float64 PASSED [0.4523s] [ 28%]
2025-12-04T14:00:07.8908532Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%]
2025-12-04T14:00:07.8909699Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%]
2025-12-04T14:00:07.8910688Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sign_cuda_float64 PASSED [0.4572s] [ 28%]
2025-12-04T14:00:07.8911690Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_signbit_cuda_float64 SKIPPED [0.0027s] (Skipped! Op doesn't support autograd) [ 28%]
2025-12-04T14:00:07.8912867Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8914038Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8915205Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8916385Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8917561Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_complex128 SKIPPED [0.0004s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8918732Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8919977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8921194Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8922374Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8923548Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%]
2025-12-04T14:00:07.8924555Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_trunc_cuda_float64 PASSED [0.4545s] [ 29%]
2025-12-04T14:00:07.8925407Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex128 PASSED [0.0035s] [ 29%]
2025-12-04T14:00:07.8926276Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex64 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8927127Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float32 PASSED [0.0033s] [ 29%]
2025-12-04T14:00:07.8927971Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float64 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8928905Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int16 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8929726Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int32 PASSED [0.0033s] [ 29%]
2025-12-04T14:00:07.8930542Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int64 PASSED [0.0030s] [ 29%]
2025-12-04T14:00:07.8931422Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int8 PASSED [0.0029s] [ 29%]
2025-12-04T14:00:07.8932241Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_uint8 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8933085Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex128 PASSED [0.0030s] [ 29%]
2025-12-04T14:00:07.8933949Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex64 PASSED [0.0029s] [ 29%]
2025-12-04T14:00:07.8934800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float32 PASSED [0.0032s] [ 29%]
2025-12-04T14:00:07.8935641Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float64 PASSED [0.0029s] [ 29%]
2025-12-04T14:00:07.8936468Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int16 PASSED [0.0029s] [ 29%]
2025-12-04T14:00:07.8937297Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int32 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8938121Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int64 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8938942Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int8 PASSED [0.0032s] [ 29%]
2025-12-04T14:00:07.8939806Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_uint8 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8940661Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex128 PASSED [0.0034s] [ 29%]
2025-12-04T14:00:07.8941534Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex64 PASSED [0.0031s] [ 29%]
2025-12-04T14:00:07.8942398Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float32 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8943249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float64 PASSED [0.0033s] [ 30%]
2025-12-04T14:00:07.8944100Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int16 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8944933Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int32 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8945815Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int64 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8946638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int8 PASSED [0.0032s] [ 30%]
2025-12-04T14:00:07.8947510Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_uint8 PASSED [0.0030s] [ 30%]
2025-12-04T14:00:07.8948362Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex128 PASSED [0.0030s] [ 30%]
2025-12-04T14:00:07.8949277Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex64 PASSED [0.0033s] [ 30%]
2025-12-04T14:00:07.8950127Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float32 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8950965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float64 PASSED [0.0031s] [ 30%]
2025-12-04T14:00:07.8951802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int16 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8952626Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int32 PASSED [0.0032s] [ 30%]
2025-12-04T14:00:07.8953449Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int64 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8954269Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int8 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8955134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_uint8 PASSED [0.0032s] [ 30%]
2025-12-04T14:00:07.8955987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex128 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8956899Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex64 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8957760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float32 PASSED [0.0034s] [ 30%]
2025-12-04T14:00:07.8958615Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float64 PASSED [0.0030s] [ 30%]
2025-12-04T14:00:07.8959456Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int16 PASSED [0.0030s] [ 30%]
2025-12-04T14:00:07.8960291Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int32 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8961132Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int64 PASSED [0.0032s] [ 30%]
2025-12-04T14:00:07.8961965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int8 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8962794Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_uint8 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8963638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float32 PASSED [0.0030s] [ 30%]
2025-12-04T14:00:07.8964482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float64 PASSED [0.0033s] [ 30%]
2025-12-04T14:00:07.8965319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int16 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8966149Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int32 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8966980Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int64 PASSED [0.0029s] [ 30%]
2025-12-04T14:00:07.8967802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int8 PASSED [0.0032s] [ 30%]
2025-12-04T14:00:07.8968653Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_uint8 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8969523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex128 PASSED [0.0031s] [ 31%]
2025-12-04T14:00:07.8970395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex64 PASSED [0.0030s] [ 31%]
2025-12-04T14:00:07.8971243Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float32 PASSED [0.0032s] [ 31%]
2025-12-04T14:00:07.8972133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float64 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8973005Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int16 PASSED [0.0028s] [ 31%]
2025-12-04T14:00:07.8973834Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int32 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8974663Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int64 PASSED [0.0032s] [ 31%]
2025-12-04T14:00:07.8975482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int8 PASSED [0.0028s] [ 31%]
2025-12-04T14:00:07.8976301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_uint8 PASSED [0.0028s] [ 31%]
2025-12-04T14:00:07.8977193Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex128 PASSED [0.0030s] [ 31%]
2025-12-04T14:00:07.8978141Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex64 PASSED [0.0033s] [ 31%]
2025-12-04T14:00:07.8979133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float32 PASSED [0.0030s] [ 31%]
2025-12-04T14:00:07.8980053Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float64 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8981008Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int16 PASSED [0.0028s] [ 31%]
2025-12-04T14:00:07.8981915Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int32 PASSED [0.0031s] [ 31%]
2025-12-04T14:00:07.8982855Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int64 PASSED [0.0028s] [ 31%]
2025-12-04T14:00:07.8983744Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int8 PASSED [0.0028s] [ 31%]
2025-12-04T14:00:07.8984648Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_uint8 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8985532Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float32 PASSED [0.0033s] [ 31%]
2025-12-04T14:00:07.8986395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float64 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8987257Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int16 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8988109Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int32 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8989018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int64 PASSED [0.0033s] [ 31%]
2025-12-04T14:00:07.8989859Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int8 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8990707Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_uint8 PASSED [0.0030s] [ 31%]
2025-12-04T14:00:07.8991553Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float32 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8992399Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float64 PASSED [0.0033s] [ 31%]
2025-12-04T14:00:07.8993228Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int16 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8994047Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int32 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8994861Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int64 PASSED [0.0029s] [ 31%]
2025-12-04T14:00:07.8995677Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int8 PASSED [0.0032s] [ 32%]
2025-12-04T14:00:07.8996491Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_uint8 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.8997379Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float32 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.8998243Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.8999220Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int16 PASSED [0.0034s] [ 32%]
2025-12-04T14:00:07.9000066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int32 PASSED [0.0030s] [ 32%]
2025-12-04T14:00:07.9000907Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int64 PASSED [0.0030s] [ 32%]
2025-12-04T14:00:07.9001755Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int8 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9007162Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_uint8 PASSED [0.0032s] [ 32%]
2025-12-04T14:00:07.9008367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex128 PASSED [0.0030s] [ 32%]
2025-12-04T14:00:07.9009313Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9010186Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float32 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9011037Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float64 PASSED [0.0033s] [ 32%]
2025-12-04T14:00:07.9011989Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int16 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9012826Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int32 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9013721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9014544Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int8 PASSED [0.0033s] [ 32%]
2025-12-04T14:00:07.9015377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_uint8 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9016217Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float32 PASSED [0.0030s] [ 32%]
2025-12-04T14:00:07.9017074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9017918Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int16 PASSED [0.0032s] [ 32%]
2025-12-04T14:00:07.9018802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int32 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9019695Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9020531Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int8 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9021360Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_uint8 PASSED [0.0032s] [ 32%]
2025-12-04T14:00:07.9022198Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float32 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9023040Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9023896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex128 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9024767Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex64 PASSED [0.0033s] [ 32%]
2025-12-04T14:00:07.9025629Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float32 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9026478Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float64 PASSED [0.0029s] [ 32%]
2025-12-04T14:00:07.9027318Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int16 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9028217Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int32 PASSED [0.0031s] [ 33%]
2025-12-04T14:00:07.9028604Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int64 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9029032Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int8 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9029399Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_uint8 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9029776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex128 PASSED [0.0032s] [ 33%]
2025-12-04T14:00:07.9030157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex64 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9030524Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float32 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9030885Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float64 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9031241Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int16 PASSED [0.0032s] [ 33%]
2025-12-04T14:00:07.9031601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int32 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9031955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int64 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9032355Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int8 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9032711Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_uint8 PASSED [0.0032s] [ 33%]
2025-12-04T14:00:07.9033137Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float32 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9033519Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float64 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9033888Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int16 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9034256Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int32 PASSED [0.0032s] [ 33%]
2025-12-04T14:00:07.9034623Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int64 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9034991Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int8 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9035362Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_uint8 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9035739Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float32 PASSED [0.0033s] [ 33%]
2025-12-04T14:00:07.9036115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float64 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9036481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int16 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9036848Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int32 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9037220Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int64 PASSED [0.0032s] [ 33%]
2025-12-04T14:00:07.9037586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int8 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9037955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_uint8 PASSED [0.0028s] [ 33%]
2025-12-04T14:00:07.9038333Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex128 PASSED [0.0030s] [ 33%]
2025-12-04T14:00:07.9038707Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex64 PASSED [0.0033s] [ 33%]
2025-12-04T14:00:07.9039072Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float32 PASSED [0.0029s] [ 33%]
2025-12-04T14:00:07.9039479Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float64 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9039880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int16 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9040236Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int32 PASSED [0.0032s] [ 34%]
2025-12-04T14:00:07.9040594Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int64 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9040946Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int8 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9041301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_uint8 PASSED [0.0030s] [ 34%]
2025-12-04T14:00:07.9041686Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float32 PASSED [0.0033s] [ 34%]
2025-12-04T14:00:07.9042066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float64 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9042441Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int16 PASSED [0.0028s] [ 34%]
2025-12-04T14:00:07.9042818Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int32 PASSED [0.0028s] [ 34%]
2025-12-04T14:00:07.9043230Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int64 PASSED [0.0032s] [ 34%]
2025-12-04T14:00:07.9043605Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int8 PASSED [0.0028s] [ 34%]
2025-12-04T14:00:07.9043977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_uint8 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9044385Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex128 PASSED [0.0030s] [ 34%]
2025-12-04T14:00:07.9044751Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex64 PASSED [0.0033s] [ 34%]
2025-12-04T14:00:07.9045109Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float32 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9045465Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float64 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9045817Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int16 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9046165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int32 PASSED [0.0032s] [ 34%]
2025-12-04T14:00:07.9046512Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int64 PASSED [0.0028s] [ 34%]
2025-12-04T14:00:07.9046856Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int8 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9047199Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_uint8 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9047622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float32 PASSED [0.0033s] [ 34%]
2025-12-04T14:00:07.9048043Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float64 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9048462Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int16 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9048923Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int32 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9049333Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int64 PASSED [0.0032s] [ 34%]
2025-12-04T14:00:07.9049743Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int8 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9050151Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_uint8 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9050548Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex128 PASSED [0.0029s] [ 34%]
2025-12-04T14:00:07.9050979Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex64 PASSED [0.0033s] [ 34%]
2025-12-04T14:00:07.9051395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float32 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9051778Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float64 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9052146Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int16 PASSED [0.0028s] [ 35%]
2025-12-04T14:00:07.9052514Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int32 PASSED [0.0032s] [ 35%]
2025-12-04T14:00:07.9052882Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int64 PASSED [0.0028s] [ 35%]
2025-12-04T14:00:07.9053249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int8 PASSED [0.0028s] [ 35%]
2025-12-04T14:00:07.9053621Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_uint8 PASSED [0.0028s] [ 35%]
2025-12-04T14:00:07.9053992Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float32 PASSED [0.0034s] [ 35%]
2025-12-04T14:00:07.9054364Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float64 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9054766Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int16 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9055128Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int32 PASSED [0.0030s] [ 35%]
2025-12-04T14:00:07.9055561Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int64 PASSED [0.0033s] [ 35%]
2025-12-04T14:00:07.9055919Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int8 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9056287Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_uint8 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9056647Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float32 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9057011Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float64 PASSED [0.0033s] [ 35%]
2025-12-04T14:00:07.9057378Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int16 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9057731Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int32 PASSED [0.0028s] [ 35%]
2025-12-04T14:00:07.9058092Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int64 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9058443Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int8 PASSED [0.0032s] [ 35%]
2025-12-04T14:00:07.9058834Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_uint8 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9059288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex128 PASSED [0.0030s] [ 35%]
2025-12-04T14:00:07.9059656Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex64 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9060014Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float32 PASSED [0.0033s] [ 35%]
2025-12-04T14:00:07.9060370Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float64 PASSED [0.0030s] [ 35%]
2025-12-04T14:00:07.9060714Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int16 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9061068Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int32 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9061417Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int64 PASSED [0.0032s] [ 35%]
2025-12-04T14:00:07.9061763Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int8 PASSED [0.0029s] [ 35%]
2025-12-04T14:00:07.9062160Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_uint8 PASSED [0.0028s] [ 35%]
2025-12-04T14:00:07.9062558Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float32 PASSED [0.0030s] [ 35%]
2025-12-04T14:00:07.9062923Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float64 PASSED [0.0033s] [ 36%]
2025-12-04T14:00:07.9063273Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int16 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9063622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int32 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9063977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int64 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9064320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int8 PASSED [0.0032s] [ 36%]
2025-12-04T14:00:07.9064678Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_uint8 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9065052Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float32 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9065425Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float64 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9065836Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int16 PASSED [0.0032s] [ 36%]
2025-12-04T14:00:07.9066199Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int32 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9066602Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int64 PASSED [0.0028s] [ 36%]
2025-12-04T14:00:07.9066963Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int8 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9067325Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_uint8 PASSED [0.0032s] [ 36%]
2025-12-04T14:00:07.9067697Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex128 PASSED [0.0030s] [ 36%]
2025-12-04T14:00:07.9068061Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex64 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9068423Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float32 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9068776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float64 PASSED [0.0033s] [ 36%]
2025-12-04T14:00:07.9069118Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int16 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9069469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int32 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9069811Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int64 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9070157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int8 PASSED [0.0033s] [ 36%]
2025-12-04T14:00:07.9070506Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_uint8 PASSED [0.0030s] [ 36%]
2025-12-04T14:00:07.9070877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex128 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9071249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex64 PASSED [0.0030s] [ 36%]
2025-12-04T14:00:07.9071606Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float32 PASSED [0.0034s] [ 36%]
2025-12-04T14:00:07.9071968Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float64 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9072322Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int16 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9072672Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int32 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9073066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int64 PASSED [0.0033s] [ 36%]
2025-12-04T14:00:07.9073449Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int8 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9073800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_uint8 PASSED [0.0029s] [ 36%]
2025-12-04T14:00:07.9074176Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex128 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9074541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex64 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9074907Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float32 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9075263Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9075613Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int16 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9075967Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int32 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9076315Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9076706Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int8 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9077055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_uint8 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9077468Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex128 PASSED [0.0034s] [ 37%]
2025-12-04T14:00:07.9077833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9078187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float32 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9078559Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9078943Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int16 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9079287Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int32 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9079636Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9079979Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int8 PASSED [0.0030s] [ 37%]
2025-12-04T14:00:07.9080329Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_uint8 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9080701Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex128 PASSED [0.0030s] [ 37%]
2025-12-04T14:00:07.9081066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9081433Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float32 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9081795Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float64 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9082150Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int16 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9082497Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int32 PASSED [0.0030s] [ 37%]
2025-12-04T14:00:07.9082845Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9083195Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int8 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9083544Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_uint8 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9083950Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float32 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9084357Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float64 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9084714Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int16 PASSED [0.0033s] [ 37%]
2025-12-04T14:00:07.9085072Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int32 PASSED [0.0029s] [ 37%]
2025-12-04T14:00:07.9085425Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int64 PASSED [0.0029s] [ 38%]
2025-12-04T14:00:07.9085777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int8 PASSED [0.0029s] [ 38%]
2025-12-04T14:00:07.9086134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_uint8 PASSED [0.0033s] [ 38%]
2025-12-04T14:00:07.9086484Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex128 PASSED [0.1911s] [ 38%]
2025-12-04T14:00:07.9086840Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex64 PASSED [0.0067s] [ 38%]
2025-12-04T14:00:07.9087180Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float32 PASSED [0.1742s] [ 38%]
2025-12-04T14:00:07.9087520Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float64 PASSED [0.0065s] [ 38%]
2025-12-04T14:00:07.9087896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int16 PASSED [0.1743s] [ 38%]
2025-12-04T14:00:07.9088230Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int32 PASSED [0.0063s] [ 38%]
2025-12-04T14:00:07.9088669Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int64 PASSED [0.1733s] [ 38%]
2025-12-04T14:00:07.9089040Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int8 PASSED [0.0064s] [ 38%]
2025-12-04T14:00:07.9089367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_uint8 PASSED [0.1728s] [ 38%]
2025-12-04T14:00:07.9089729Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex128 PASSED [0.0066s] [ 38%]
2025-12-04T14:00:07.9090083Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex64 PASSED [0.1730s] [ 38%]
2025-12-04T14:00:07.9090430Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float32 PASSED [0.0065s] [ 38%]
2025-12-04T14:00:07.9090773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float64 PASSED [0.1741s] [ 38%]
2025-12-04T14:00:07.9091107Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int16 PASSED [0.0066s] [ 38%]
2025-12-04T14:00:07.9091452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int32 PASSED [0.1739s] [ 38%]
2025-12-04T14:00:07.9091783Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int64 PASSED [0.0066s] [ 38%]
2025-12-04T14:00:07.9092116Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int8 PASSED [0.1738s] [ 38%]
2025-12-04T14:00:07.9092454Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_uint8 PASSED [0.0066s] [ 38%]
2025-12-04T14:00:07.9092815Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex128 PASSED [0.1740s] [ 38%]
2025-12-04T14:00:07.9093176Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex64 PASSED [0.0067s] [ 38%]
2025-12-04T14:00:07.9093523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float32 PASSED [0.1739s] [ 38%]
2025-12-04T14:00:07.9093870Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float64 PASSED [0.0066s] [ 38%]
2025-12-04T14:00:07.9094213Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int16 PASSED [0.1738s] [ 38%]
2025-12-04T14:00:07.9094550Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int32 PASSED [0.0065s] [ 38%]
2025-12-04T14:00:07.9094936Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int64 PASSED [0.1740s] [ 38%]
2025-12-04T14:00:07.9095312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int8 PASSED [0.0066s] [ 38%]
2025-12-04T14:00:07.9095655Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_uint8 PASSED [0.1735s] [ 38%]
2025-12-04T14:00:07.9096018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex128 PASSED [0.0067s] [ 38%]
2025-12-04T14:00:07.9096370Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex64 PASSED [0.1740s] [ 39%]
2025-12-04T14:00:07.9096719Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float32 PASSED [0.0067s] [ 39%]
2025-12-04T14:00:07.9097062Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float64 PASSED [0.1740s] [ 39%]
2025-12-04T14:00:07.9097395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int16 PASSED [0.0065s] [ 39%]
2025-12-04T14:00:07.9097733Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int32 PASSED [0.1737s] [ 39%]
2025-12-04T14:00:07.9098067Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int64 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9098401Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int8 PASSED [0.1740s] [ 39%]
2025-12-04T14:00:07.9098803Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_uint8 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9099227Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex128 PASSED [0.1749s] [ 39%]
2025-12-04T14:00:07.9099734Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex64 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9100110Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float32 PASSED [0.1739s] [ 39%]
2025-12-04T14:00:07.9100481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float64 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9100849Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int16 PASSED [0.1742s] [ 39%]
2025-12-04T14:00:07.9101214Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int32 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9101578Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int64 PASSED [0.1739s] [ 39%]
2025-12-04T14:00:07.9101938Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int8 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9102301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_uint8 PASSED [0.1738s] [ 39%]
2025-12-04T14:00:07.9102675Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float32 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9103042Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float64 PASSED [0.1743s] [ 39%]
2025-12-04T14:00:07.9103404Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int16 PASSED [0.0064s] [ 39%]
2025-12-04T14:00:07.9103761Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int32 PASSED [0.1742s] [ 39%]
2025-12-04T14:00:07.9104121Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int64 PASSED [0.0063s] [ 39%]
2025-12-04T14:00:07.9104480Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int8 PASSED [0.1736s] [ 39%]
2025-12-04T14:00:07.9104837Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_uint8 PASSED [0.0064s] [ 39%]
2025-12-04T14:00:07.9105222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex128 PASSED [0.1743s] [ 39%]
2025-12-04T14:00:07.9105601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex64 PASSED [0.0066s] [ 39%]
2025-12-04T14:00:07.9105968Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float32 PASSED [0.1742s] [ 39%]
2025-12-04T14:00:07.9106383Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float64 PASSED [0.0065s] [ 39%]
2025-12-04T14:00:07.9106741Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int16 PASSED [0.1737s] [ 39%]
2025-12-04T14:00:07.9107142Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int32 PASSED [0.0063s] [ 39%]
2025-12-04T14:00:07.9107502Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int64 PASSED [0.1740s] [ 39%]
2025-12-04T14:00:07.9108015Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int8 PASSED [0.0063s] [ 40%]
2025-12-04T14:00:07.9108352Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_uint8 PASSED [0.1738s] [ 40%]
2025-12-04T14:00:07.9108801Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex128 PASSED [0.0066s] [ 40%]
2025-12-04T14:00:07.9109201Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex64 PASSED [0.1743s] [ 40%]
2025-12-04T14:00:07.9109585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float32 PASSED [0.0065s] [ 40%]
2025-12-04T14:00:07.9109969Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float64 PASSED [0.1738s] [ 40%]
2025-12-04T14:00:07.9110342Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int16 PASSED [0.0063s] [ 40%]
2025-12-04T14:00:07.9110785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int32 PASSED [0.1737s] [ 40%]
2025-12-04T14:00:07.9111159Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int64 PASSED [0.0063s] [ 40%]
2025-12-04T14:00:07.9111585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int8 PASSED [0.1741s] [ 40%]
2025-12-04T14:00:07.9111956Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_uint8 PASSED [0.0063s] [ 40%]
2025-12-04T14:00:07.9112321Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float32 PASSED [0.1745s] [ 40%]
2025-12-04T14:00:07.9112680Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float64 PASSED [0.0066s] [ 40%]
2025-12-04T14:00:07.9113026Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int16 PASSED [0.1743s] [ 40%]
2025-12-04T14:00:07.9113377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int32 PASSED [0.0065s] [ 40%]
2025-12-04T14:00:07.9113722Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int64 PASSED [0.1735s] [ 40%]
2025-12-04T14:00:07.9114070Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int8 PASSED [0.0066s] [ 40%]
2025-12-04T14:00:07.9114417Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_uint8 PASSED [0.1738s] [ 40%]
2025-12-04T14:00:07.9114755Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float32 PASSED [0.0066s] [ 40%]
2025-12-04T14:00:07.9115099Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float64 PASSED [0.1742s] [ 40%]
2025-12-04T14:00:07.9115432Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int16 PASSED [0.0065s] [ 40%]
2025-12-04T14:00:07.9115764Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int32 PASSED [0.1745s] [ 40%]
2025-12-04T14:00:07.9116094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int64 PASSED [0.0066s] [ 40%]
2025-12-04T14:00:07.9116421Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int8 PASSED [0.1742s] [ 40%]
2025-12-04T14:00:07.9116759Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_uint8 PASSED [0.0066s] [ 40%]
2025-12-04T14:00:07.9117108Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float32 PASSED [0.1740s] [ 40%]
2025-12-04T14:00:07.9117462Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float64 PASSED [0.0065s] [ 40%]
2025-12-04T14:00:07.9117864Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int16 PASSED [0.1745s] [ 40%]
2025-12-04T14:00:07.9118266Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int32 PASSED [0.0065s] [ 40%]
2025-12-04T14:00:07.9118619Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int64 PASSED [0.1749s] [ 40%]
2025-12-04T14:00:07.9119003Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int8 PASSED [0.0065s] [ 40%]
2025-12-04T14:00:07.9119348Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_uint8 PASSED [0.1746s] [ 41%]
2025-12-04T14:00:07.9119710Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex128 PASSED [0.0066s] [ 41%]
2025-12-04T14:00:07.9120065Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex64 PASSED [0.1745s] [ 41%]
2025-12-04T14:00:07.9120414Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float32 PASSED [0.0065s] [ 41%]
2025-12-04T14:00:07.9120762Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float64 PASSED [0.1746s] [ 41%]
2025-12-04T14:00:07.9121101Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int16 PASSED [0.0065s] [ 41%]
2025-12-04T14:00:07.9121482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int32 PASSED [0.1741s] [ 41%]
2025-12-04T14:00:07.9121821Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int64 PASSED [0.0066s] [ 41%]
2025-12-04T14:00:07.9122158Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int8 PASSED [0.1746s] [ 41%]
2025-12-04T14:00:07.9122537Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_uint8 PASSED [0.0066s] [ 41%]
2025-12-04T14:00:07.9122883Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float32 PASSED [0.1747s] [ 41%]
2025-12-04T14:00:07.9123235Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float64 PASSED [0.0065s] [ 41%]
2025-12-04T14:00:07.9123574Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int16 PASSED [0.1750s] [ 41%]
2025-12-04T14:00:07.9123915Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int32 PASSED [0.0063s] [ 41%]
2025-12-04T14:00:07.9124254Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int64 PASSED [0.1745s] [ 41%]
2025-12-04T14:00:07.9124587Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int8 PASSED [0.0063s] [ 41%]
2025-12-04T14:00:07.9124927Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_uint8 PASSED [0.1739s] [ 41%]
2025-12-04T14:00:07.9125273Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float32 PASSED [0.0065s] [ 41%]
2025-12-04T14:00:07.9125618Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float64 PASSED [0.1747s] [ 41%]
2025-12-04T14:00:07.9125981Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex128 PASSED [0.0065s] [ 41%]
2025-12-04T14:00:07.9126341Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex64 PASSED [0.1742s] [ 41%]
2025-12-04T14:00:07.9126690Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float32 PASSED [0.0064s] [ 41%]
2025-12-04T14:00:07.9127037Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float64 PASSED [0.1748s] [ 41%]
2025-12-04T14:00:07.9127377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int16 PASSED [0.0064s] [ 41%]
2025-12-04T14:00:07.9127718Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int32 PASSED [0.1746s] [ 41%]
2025-12-04T14:00:07.9128055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int64 PASSED [0.0064s] [ 41%]
2025-12-04T14:00:07.9128390Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int8 PASSED [0.1746s] [ 41%]
2025-12-04T14:00:07.9128773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_uint8 PASSED [0.0064s] [ 41%]
2025-12-04T14:00:07.9129177Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex128 PASSED [0.1745s] [ 41%]
2025-12-04T14:00:07.9129540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex64 PASSED [0.0065s] [ 41%]
2025-12-04T14:00:07.9129887Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float32 PASSED [0.1749s] [ 41%]
2025-12-04T14:00:07.9130236Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float64 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9130577Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int16 PASSED [0.1743s] [ 42%]
2025-12-04T14:00:07.9130914Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int32 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9131253Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int64 PASSED [0.1749s] [ 42%]
2025-12-04T14:00:07.9131590Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int8 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9131934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_uint8 PASSED [0.1748s] [ 42%]
2025-12-04T14:00:07.9132337Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float32 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9132700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float64 PASSED [0.1747s] [ 42%]
2025-12-04T14:00:07.9133056Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int16 PASSED [0.0065s] [ 42%]
2025-12-04T14:00:07.9133447Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int32 PASSED [0.1747s] [ 42%]
2025-12-04T14:00:07.9133800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int64 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9134149Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int8 PASSED [0.1745s] [ 42%]
2025-12-04T14:00:07.9134500Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_uint8 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9134863Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float32 PASSED [0.1751s] [ 42%]
2025-12-04T14:00:07.9135222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float64 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9135573Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int16 PASSED [0.1748s] [ 42%]
2025-12-04T14:00:07.9135925Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int32 PASSED [0.0063s] [ 42%]
2025-12-04T14:00:07.9136274Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int64 PASSED [0.1747s] [ 42%]
2025-12-04T14:00:07.9136625Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int8 PASSED [0.0064s] [ 42%]
2025-12-04T14:00:07.9136976Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_uint8 PASSED [0.1748s] [ 42%]
2025-12-04T14:00:07.9137342Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex128 PASSED [0.0066s] [ 42%]
2025-12-04T14:00:07.9137697Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex64 PASSED [0.1751s] [ 42%]
2025-12-04T14:00:07.9138043Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float32 PASSED [0.0065s] [ 42%]
2025-12-04T14:00:07.9138391Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float64 PASSED [0.1752s] [ 42%]
2025-12-04T14:00:07.9138760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int16 PASSED [0.0065s] [ 42%]
2025-12-04T14:00:07.9139167Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int32 PASSED [0.1750s] [ 42%]
2025-12-04T14:00:07.9139508Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int64 PASSED [0.0065s] [ 42%]
2025-12-04T14:00:07.9139892Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int8 PASSED [0.1750s] [ 42%]
2025-12-04T14:00:07.9140296Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_uint8 PASSED [0.0066s] [ 42%]
2025-12-04T14:00:07.9140670Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float32 PASSED [0.1749s] [ 42%]
2025-12-04T14:00:07.9141036Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float64 PASSED [0.0065s] [ 42%]
2025-12-04T14:00:07.9141396Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int16 PASSED [0.1747s] [ 43%]
2025-12-04T14:00:07.9141754Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int32 PASSED [0.0064s] [ 43%]
2025-12-04T14:00:07.9142113Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int64 PASSED [0.1749s] [ 43%]
2025-12-04T14:00:07.9142469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int8 PASSED [0.0064s] [ 43%]
2025-12-04T14:00:07.9142826Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_uint8 PASSED [0.1752s] [ 43%]
2025-12-04T14:00:07.9143180Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex128 PASSED [0.0066s] [ 43%]
2025-12-04T14:00:07.9143570Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex64 PASSED [0.1752s] [ 43%]
2025-12-04T14:00:07.9143912Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float32 PASSED [0.0065s] [ 43%]
2025-12-04T14:00:07.9144290Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float64 PASSED [0.1752s] [ 43%]
2025-12-04T14:00:07.9144620Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int16 PASSED [0.0063s] [ 43%]
2025-12-04T14:00:07.9144960Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int32 PASSED [0.1750s] [ 43%]
2025-12-04T14:00:07.9145293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int64 PASSED [0.0064s] [ 43%]
2025-12-04T14:00:07.9145633Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int8 PASSED [0.1752s] [ 43%]
2025-12-04T14:00:07.9145967Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_uint8 PASSED [0.0063s] [ 43%]
2025-12-04T14:00:07.9146375Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float32 PASSED [0.1754s] [ 43%]
2025-12-04T14:00:07.9146787Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float64 PASSED [0.0066s] [ 43%]
2025-12-04T14:00:07.9147186Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int16 PASSED [0.1752s] [ 43%]
2025-12-04T14:00:07.9147585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int32 PASSED [0.0064s] [ 43%]
2025-12-04T14:00:07.9147985Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int64 PASSED [0.1751s] [ 43%]
2025-12-04T14:00:07.9148383Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int8 PASSED [0.0063s] [ 43%]
2025-12-04T14:00:07.9148780Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_uint8 PASSED [0.1751s] [ 43%]
2025-12-04T14:00:07.9149165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex128 PASSED [0.0065s] [ 43%]
2025-12-04T14:00:07.9149546Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex64 PASSED [0.1753s] [ 43%]
2025-12-04T14:00:07.9149914Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float32 PASSED [0.0065s] [ 43%]
2025-12-04T14:00:07.9150274Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float64 PASSED [0.1750s] [ 43%]
2025-12-04T14:00:07.9150628Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int16 PASSED [0.0063s] [ 43%]
2025-12-04T14:00:07.9151026Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int32 PASSED [0.1755s] [ 43%]
2025-12-04T14:00:07.9151415Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int64 PASSED [0.0064s] [ 43%]
2025-12-04T14:00:07.9151777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int8 PASSED [0.1754s] [ 43%]
2025-12-04T14:00:07.9152133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_uint8 PASSED [0.0063s] [ 43%]
2025-12-04T14:00:07.9152491Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float32 PASSED [0.1749s] [ 43%]
2025-12-04T14:00:07.9152854Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float64 PASSED [0.0066s] [ 44%]
2025-12-04T14:00:07.9153201Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int16 PASSED [0.1754s] [ 44%]
2025-12-04T14:00:07.9153554Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int32 PASSED [0.0066s] [ 44%]
2025-12-04T14:00:07.9153899Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int64 PASSED [0.1754s] [ 44%]
2025-12-04T14:00:07.9154254Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int8 PASSED [0.0066s] [ 44%]
2025-12-04T14:00:07.9154642Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_uint8 PASSED [0.1757s] [ 44%]
2025-12-04T14:00:07.9154991Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float32 PASSED [0.0066s] [ 44%]
2025-12-04T14:00:07.9155386Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float64 PASSED [0.1755s] [ 44%]
2025-12-04T14:00:07.9155723Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int16 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9156073Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int32 PASSED [0.1752s] [ 44%]
2025-12-04T14:00:07.9156413Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int64 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9156752Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int8 PASSED [0.1751s] [ 44%]
2025-12-04T14:00:07.9157095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_uint8 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9157454Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex128 PASSED [0.1757s] [ 44%]
2025-12-04T14:00:07.9157803Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex64 PASSED [0.0066s] [ 44%]
2025-12-04T14:00:07.9158146Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float32 PASSED [0.1754s] [ 44%]
2025-12-04T14:00:07.9158486Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float64 PASSED [0.0065s] [ 44%]
2025-12-04T14:00:07.9158820Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int16 PASSED [0.1753s] [ 44%]
2025-12-04T14:00:07.9159152Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int32 PASSED [0.0063s] [ 44%]
2025-12-04T14:00:07.9159489Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int64 PASSED [0.1755s] [ 44%]
2025-12-04T14:00:07.9159822Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int8 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9160151Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_uint8 PASSED [0.1748s] [ 44%]
2025-12-04T14:00:07.9160496Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float32 PASSED [0.0065s] [ 44%]
2025-12-04T14:00:07.9160842Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float64 PASSED [0.1756s] [ 44%]
2025-12-04T14:00:07.9161184Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int16 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9161524Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int32 PASSED [0.1755s] [ 44%]
2025-12-04T14:00:07.9161905Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int64 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9162279Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int8 PASSED [0.1757s] [ 44%]
2025-12-04T14:00:07.9162616Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_uint8 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9162976Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float32 PASSED [0.1756s] [ 44%]
2025-12-04T14:00:07.9163339Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float64 PASSED [0.0064s] [ 44%]
2025-12-04T14:00:07.9163691Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int16 PASSED [0.1757s] [ 45%]
2025-12-04T14:00:07.9164041Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int32 PASSED [0.0064s] [ 45%]
2025-12-04T14:00:07.9164390Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int64 PASSED [0.1749s] [ 45%]
2025-12-04T14:00:07.9164734Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int8 PASSED [0.0065s] [ 45%]
2025-12-04T14:00:07.9165086Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_uint8 PASSED [0.1755s] [ 45%]
2025-12-04T14:00:07.9165481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex128 PASSED [0.0067s] [ 45%]
2025-12-04T14:00:07.9165833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex64 PASSED [0.1753s] [ 45%]
2025-12-04T14:00:07.9166175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float32 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9166552Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float64 PASSED [0.1759s] [ 45%]
2025-12-04T14:00:07.9166887Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int16 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9167219Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int32 PASSED [0.1757s] [ 45%]
2025-12-04T14:00:07.9167549Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int64 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9167877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int8 PASSED [0.1752s] [ 45%]
2025-12-04T14:00:07.9168207Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_uint8 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9168578Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex128 PASSED [0.1755s] [ 45%]
2025-12-04T14:00:07.9168929Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex64 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9169276Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float32 PASSED [0.1756s] [ 45%]
2025-12-04T14:00:07.9169619Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float64 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9169955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int16 PASSED [0.1759s] [ 45%]
2025-12-04T14:00:07.9170294Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int32 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9170626Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int64 PASSED [0.1759s] [ 45%]
2025-12-04T14:00:07.9170960Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int8 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9171296Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_uint8 PASSED [0.1760s] [ 45%]
2025-12-04T14:00:07.9171655Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex128 PASSED [0.0067s] [ 45%]
2025-12-04T14:00:07.9172010Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex64 PASSED [0.1759s] [ 45%]
2025-12-04T14:00:07.9172351Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float32 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9172740Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float64 PASSED [0.1759s] [ 45%]
2025-12-04T14:00:07.9173115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int16 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9173448Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int32 PASSED [0.1756s] [ 45%]
2025-12-04T14:00:07.9173785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int64 PASSED [0.0066s] [ 45%]
2025-12-04T14:00:07.9174115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int8 PASSED [0.1761s] [ 45%]
2025-12-04T14:00:07.9174452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_uint8 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9174804Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex128 PASSED [0.1762s] [ 46%]
2025-12-04T14:00:07.9175153Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex64 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9175504Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float32 PASSED [0.1755s] [ 46%]
2025-12-04T14:00:07.9175843Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float64 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9176172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int16 PASSED [0.1760s] [ 46%]
2025-12-04T14:00:07.9176549Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int32 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9176880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int64 PASSED [0.1759s] [ 46%]
2025-12-04T14:00:07.9177246Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int8 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9177582Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_uint8 PASSED [0.1759s] [ 46%]
2025-12-04T14:00:07.9177940Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex128 PASSED [0.0067s] [ 46%]
2025-12-04T14:00:07.9178295Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex64 PASSED [0.1761s] [ 46%]
2025-12-04T14:00:07.9178638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float32 PASSED [0.0065s] [ 46%]
2025-12-04T14:00:07.9178983Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float64 PASSED [0.1761s] [ 46%]
2025-12-04T14:00:07.9179372Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int16 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9179706Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int32 PASSED [0.1761s] [ 46%]
2025-12-04T14:00:07.9180044Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int64 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9180376Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int8 PASSED [0.1763s] [ 46%]
2025-12-04T14:00:07.9180715Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_uint8 PASSED [0.0066s] [ 46%]
2025-12-04T14:00:07.9181071Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float32 PASSED [0.1764s] [ 46%]
2025-12-04T14:00:07.9181421Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float64 PASSED [0.0065s] [ 46%]
2025-12-04T14:00:07.9181777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int16 PASSED [0.1761s] [ 46%]
2025-12-04T14:00:07.9182123Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int32 PASSED [0.0064s] [ 46%]
2025-12-04T14:00:07.9182462Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int64 PASSED [0.1762s] [ 46%]
2025-12-04T14:00:07.9182802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int8 PASSED [0.0064s] [ 46%]
2025-12-04T14:00:07.9183143Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_uint8 PASSED [0.1757s] [ 46%]
2025-12-04T14:00:07.9183656Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_bfloat16 PASSED [0.0563s] [ 46%]
2025-12-04T14:00:07.9184119Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float16 PASSED [0.0347s] [ 46%]
2025-12-04T14:00:07.9184540Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float32 PASSED [0.0340s] [ 46%]
2025-12-04T14:00:07.9184966Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float64 PASSED [0.0339s] [ 46%]
2025-12-04T14:00:07.9185384Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int16 PASSED [0.0525s] [ 46%]
2025-12-04T14:00:07.9185810Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int32 PASSED [0.0260s] [ 47%]
2025-12-04T14:00:07.9186221Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int64 PASSED [0.0254s] [ 47%]
2025-12-04T14:00:07.9186632Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int8 PASSED [0.0258s] [ 47%]
2025-12-04T14:00:07.9187052Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_uint8 PASSED [0.0258s] [ 47%]
2025-12-04T14:00:07.9187477Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_bfloat16 PASSED [0.0341s] [ 47%]
2025-12-04T14:00:07.9187943Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float16 PASSED [0.0339s] [ 47%]
2025-12-04T14:00:07.9188368Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float32 PASSED [0.0339s] [ 47%]
2025-12-04T14:00:07.9188829Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float64 PASSED [0.0340s] [ 47%]
2025-12-04T14:00:07.9189242Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int16 PASSED [0.0256s] [ 47%]
2025-12-04T14:00:07.9189655Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int32 PASSED [0.0256s] [ 47%]
2025-12-04T14:00:07.9190073Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int64 PASSED [0.0253s] [ 47%]
2025-12-04T14:00:07.9190482Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int8 PASSED [0.0256s] [ 47%]
2025-12-04T14:00:07.9190895Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_uint8 PASSED [0.0253s] [ 47%]
2025-12-04T14:00:07.9191322Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bfloat16 PASSED [1.4614s] [ 47%]
2025-12-04T14:00:07.9191732Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bool PASSED [1.4099s] [ 47%]
2025-12-04T14:00:07.9192166Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex128 PASSED [3.7291s] [ 47%]
2025-12-04T14:00:07.9192600Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex64 PASSED [2.2238s] [ 47%]
2025-12-04T14:00:07.9193027Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float16 PASSED [1.4384s] [ 47%]
2025-12-04T14:00:07.9193456Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float32 PASSED [0.7592s] [ 47%]
2025-12-04T14:00:07.9193874Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float64 PASSED [0.7599s] [ 47%]
2025-12-04T14:00:07.9194285Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int16 PASSED [0.0261s] [ 47%]
2025-12-04T14:00:07.9194704Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int32 PASSED [0.0257s] [ 47%]
2025-12-04T14:00:07.9195118Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int64 PASSED [0.0252s] [ 47%]
2025-12-04T14:00:07.9195575Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int8 PASSED [0.0258s] [ 47%]
2025-12-04T14:00:07.9196027Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_uint8 PASSED [0.0257s] [ 47%]
2025-12-04T14:00:07.9196454Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bfloat16 PASSED [0.0423s] [ 47%]
2025-12-04T14:00:07.9196866Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bool PASSED [0.0304s] [ 47%]
2025-12-04T14:00:07.9197300Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex128 PASSED [0.0302s] [ 47%]
2025-12-04T14:00:07.9197729Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex64 PASSED [0.0302s] [ 47%]
2025-12-04T14:00:07.9198146Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float16 PASSED [0.0372s] [ 47%]
2025-12-04T14:00:07.9198563Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float32 PASSED [0.0372s] [ 47%]
2025-12-04T14:00:07.9198985Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float64 PASSED [0.0374s] [ 48%]
2025-12-04T14:00:07.9199434Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int16 PASSED [0.0287s] [ 48%]
2025-12-04T14:00:07.9199843Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int32 PASSED [0.0285s] [ 48%]
2025-12-04T14:00:07.9200249Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int64 PASSED [0.0281s] [ 48%]
2025-12-04T14:00:07.9200693Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int8 PASSED [0.0285s] [ 48%]
2025-12-04T14:00:07.9201104Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_uint8 PASSED [0.0288s] [ 48%]
2025-12-04T14:00:07.9201414Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_bfloat16 PASSED [0.0428s] [ 48%]
2025-12-04T14:00:07.9201731Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_complex128 PASSED [0.0131s] [ 48%]
2025-12-04T14:00:07.9202031Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_float64 PASSED [0.0127s] [ 48%]
2025-12-04T14:00:07.9202504Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_complex128 SKIPPED [0.0002s] (multi-GPU not supported) [ 48%]
2025-12-04T14:00:07.9202958Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_float64 SKIPPED [0.0002s] (multi-GPU not supported) [ 48%]
2025-12-04T14:00:07.9203293Z test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_complex128 PASSED [0.0023s] [ 48%]
2025-12-04T14:00:07.9203614Z test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_float64 PASSED [0.0025s] [ 48%]
2025-12-04T14:00:07.9203916Z test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_complex128 PASSED [0.0029s] [ 48%]
2025-12-04T14:00:07.9204201Z test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_float64 PASSED [0.0028s] [ 48%]
2025-12-04T14:00:07.9204475Z test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_complex128 PASSED [0.0026s] [ 48%]
2025-12-04T14:00:07.9204729Z test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_float64 PASSED [0.0025s] [ 48%]
2025-12-04T14:00:07.9204993Z test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_complex128 PASSED [0.0161s] [ 48%]
2025-12-04T14:00:07.9205243Z test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_float64 PASSED [0.0159s] [ 48%]
2025-12-04T14:00:07.9205480Z test_sparse.py::TestSparseCUDA::test_any_cuda PASSED [0.0023s]           [ 48%]
2025-12-04T14:00:07.9205742Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float32 PASSED [0.0131s] [ 48%]
2025-12-04T14:00:07.9206000Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float64 PASSED [0.0129s] [ 48%]
2025-12-04T14:00:07.9206294Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int16 PASSED [0.0102s] [ 48%]
2025-12-04T14:00:07.9206548Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int32 PASSED [0.0102s] [ 48%]
2025-12-04T14:00:07.9206834Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int64 PASSED [0.0108s] [ 48%]
2025-12-04T14:00:07.9207084Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int8 PASSED [0.0101s] [ 48%]
2025-12-04T14:00:07.9207330Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_uint8 PASSED [0.0101s] [ 48%]
2025-12-04T14:00:07.9207567Z test_sparse.py::TestSparseCUDA::test_assign_cuda_float64 PASSED [0.0030s] [ 48%]
2025-12-04T14:00:07.9208070Z test_sparse.py::TestSparseCUDA::test_basic_cuda_complex128 PASSED [0.0151s] [ 48%]
2025-12-04T14:00:07.9208400Z test_sparse.py::TestSparseCUDA::test_basic_cuda_float64 PASSED [0.0144s] [ 48%]
2025-12-04T14:00:07.9208681Z test_sparse.py::TestSparseCUDA::test_basic_ops_cuda_float64 PASSED [0.3694s] [ 48%]
2025-12-04T14:00:07.9208923Z test_sparse.py::TestSparseCUDA::test_bmm_cuda_float64 PASSED [0.2602s]   [ 49%]
2025-12-04T14:00:07.9209206Z test_sparse.py::TestSparseCUDA::test_bmm_deterministic_cuda_float64 PASSED [0.1863s] [ 49%]
2025-12-04T14:00:07.9209444Z test_sparse.py::TestSparseCUDA::test_bmm_oob_cuda PASSED [0.0367s]       [ 49%]
2025-12-04T14:00:07.9210183Z test_sparse.py::TestSparseCUDA::test_bmm_windows_error_cuda_float64 SKIPPED [0.0003s] (this test ensures bmm sparse-dense CUDA gives an error when run on Windows with CUDA < 11.0) [ 49%]
2025-12-04T14:00:07.9210434Z test_sparse.py::TestSparseCUDA::test_cat_cuda_complex128 PASSED [0.0381s] [ 49%]
2025-12-04T14:00:07.9210667Z test_sparse.py::TestSparseCUDA::test_cat_cuda_float64 PASSED [0.0367s]   [ 49%]
2025-12-04T14:00:07.9211041Z test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_complex128 PASSED [0.0040s] [ 49%]
2025-12-04T14:00:07.9211355Z test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_float64 PASSED [0.0031s] [ 49%]
2025-12-04T14:00:07.9211605Z test_sparse.py::TestSparseCUDA::test_clone_cuda_complex128 PASSED [0.0075s] [ 49%]
2025-12-04T14:00:07.9211839Z test_sparse.py::TestSparseCUDA::test_clone_cuda_float64 PASSED [0.0074s] [ 49%]
2025-12-04T14:00:07.9212305Z test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32 SKIPPED [0.1715s] (Insufficient cuda memory) [ 49%]
2025-12-04T14:00:07.9212559Z test_sparse.py::TestSparseCUDA::test_coalesce_cuda_bfloat16 PASSED [0.0205s] [ 49%]
2025-12-04T14:00:07.9212823Z test_sparse.py::TestSparseCUDA::test_coalesce_cuda_complex128 PASSED [0.0189s] [ 49%]
2025-12-04T14:00:07.9213070Z test_sparse.py::TestSparseCUDA::test_coalesce_cuda_float64 PASSED [0.0178s] [ 49%]
2025-12-04T14:00:07.9213389Z test_sparse.py::TestSparseCUDA::test_coalesce_reference_cycle_cuda_float64 PASSED [0.0022s] [ 49%]
2025-12-04T14:00:07.9213780Z test_sparse.py::TestSparseCUDA::test_coalesce_transpose_mm_cuda_float64 SKIPPED [0.0013s] (Only runs on cpu) [ 49%]
2025-12-04T14:00:07.9214037Z test_sparse.py::TestSparseCUDA::test_contig_cuda_complex128 PASSED [0.0056s] [ 49%]
2025-12-04T14:00:07.9214280Z test_sparse.py::TestSparseCUDA::test_contig_cuda_float64 PASSED [0.0055s] [ 49%]
2025-12-04T14:00:07.9214563Z test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_complex128 PASSED [0.0055s] [ 49%]
2025-12-04T14:00:07.9214823Z test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_float64 PASSED [0.0053s] [ 49%]
2025-12-04T14:00:07.9215177Z test_sparse.py::TestSparseCUDA::test_ctor_is_coalesced_with_gradcheck_cuda_float64 PASSED [0.2881s] [ 49%]
2025-12-04T14:00:07.9215457Z test_sparse.py::TestSparseCUDA::test_ctor_large_sizes_cuda_float64 PASSED [0.0021s] [ 49%]
2025-12-04T14:00:07.9215745Z test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_complex128 PASSED [0.0018s] [ 49%]
2025-12-04T14:00:07.9216019Z test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_float64 PASSED [0.0019s] [ 49%]
2025-12-04T14:00:07.9216254Z test_sparse.py::TestSparseCUDA::test_cuda_empty_cuda PASSED [0.0022s]    [ 49%]
2025-12-04T14:00:07.9216596Z test_sparse.py::TestSparseCUDA::test_div_by_sparse_error_cuda PASSED [0.0019s] [ 49%]
2025-12-04T14:00:07.9216874Z test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float32 PASSED [0.0117s] [ 49%]
2025-12-04T14:00:07.9217228Z test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float64 PASSED [0.0114s] [ 49%]
2025-12-04T14:00:07.9217476Z test_sparse.py::TestSparseCUDA::test_dsmm_cuda_float64 PASSED [0.0622s]  [ 49%]
2025-12-04T14:00:07.9217765Z test_sparse.py::TestSparseCUDA::test_dtypes_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 49%]
2025-12-04T14:00:07.9218190Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bfloat16 SKIPPED [0.0017s] (Only runs on cpu) [ 49%]
2025-12-04T14:00:07.9218623Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bool SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9219137Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9219568Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9219983Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9220445Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9220856Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9221259Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9221739Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9222141Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9222548Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9222953Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_uint8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9223367Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bfloat16 SKIPPED [0.0016s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9223767Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bool SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9224193Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9224611Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9225018Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9225424Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9225833Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float64 SKIPPED [0.0017s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9226232Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9226636Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9227035Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9227427Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9227877Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_uint8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9228183Z test_sparse.py::TestSparseCUDA::test_empty_like_cuda_complex128 PASSED [0.0099s] [ 50%]
2025-12-04T14:00:07.9228448Z test_sparse.py::TestSparseCUDA::test_empty_like_cuda_float64 PASSED [0.0086s] [ 50%]
2025-12-04T14:00:07.9228762Z test_sparse.py::TestSparseCUDA::test_factory_copy_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9229098Z test_sparse.py::TestSparseCUDA::test_factory_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9229434Z test_sparse.py::TestSparseCUDA::test_factory_cuda_complex64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9229753Z test_sparse.py::TestSparseCUDA::test_factory_cuda_float16 SKIPPED [0.0013s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9230074Z test_sparse.py::TestSparseCUDA::test_factory_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9230395Z test_sparse.py::TestSparseCUDA::test_factory_cuda_float64 SKIPPED [0.0017s] (Only runs on cpu) [ 50%]
2025-12-04T14:00:07.9230694Z test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_complex128 PASSED [0.0022s] [ 51%]
2025-12-04T14:00:07.9230979Z test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_float64 PASSED [0.0020s] [ 51%]
2025-12-04T14:00:07.9231328Z test_sparse.py::TestSparseCUDA::test_factory_device_type_inference_cuda PASSED [0.0047s] [ 51%]
2025-12-04T14:00:07.9231598Z test_sparse.py::TestSparseCUDA::test_factory_empty_indices_cuda PASSED [0.0019s] [ 51%]
2025-12-04T14:00:07.9231867Z test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_complex128 PASSED [0.0021s] [ 51%]
2025-12-04T14:00:07.9232167Z test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_float64 PASSED [0.0025s] [ 51%]
2025-12-04T14:00:07.9232460Z test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_complex128 PASSED [0.0037s] [ 51%]
2025-12-04T14:00:07.9232738Z test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_float64 PASSED [0.0035s] [ 51%]
2025-12-04T14:00:07.9233038Z test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_complex128 PASSED [0.0029s] [ 51%]
2025-12-04T14:00:07.9233328Z test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_float64 PASSED [0.0027s] [ 51%]
2025-12-04T14:00:07.9233733Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex128 SKIPPED [0.0013s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9234136Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex64 SKIPPED [0.0013s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9234524Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float16 SKIPPED [0.0017s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9234911Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9235303Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9235683Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9235987Z test_sparse.py::TestSparseCUDA::test_floor_divide_by_sparse_error_cuda PASSED [0.0018s] [ 51%]
2025-12-04T14:00:07.9236280Z test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_complex128 PASSED [0.0429s] [ 51%]
2025-12-04T14:00:07.9236566Z test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_float64 PASSED [0.0317s] [ 51%]
2025-12-04T14:00:07.9236807Z test_sparse.py::TestSparseCUDA::test_hsmm_cuda_float64 PASSED [0.0214s]  [ 51%]
2025-12-04T14:00:07.9237083Z test_sparse.py::TestSparseCUDA::test_index_select_cuda_complex128 PASSED [0.1254s] [ 51%]
2025-12-04T14:00:07.9237345Z test_sparse.py::TestSparseCUDA::test_index_select_cuda_float64 PASSED [0.1184s] [ 51%]
2025-12-04T14:00:07.9237744Z test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_complex128 PASSED [0.0114s] [ 51%]
2025-12-04T14:00:07.9238181Z test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_float64 PASSED [0.0108s] [ 51%]
2025-12-04T14:00:07.9238594Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_complex128 PASSED [0.1502s] [ 51%]
2025-12-04T14:00:07.9238994Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_float64 PASSED [0.1327s] [ 51%]
2025-12-04T14:00:07.9239381Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_complex128 PASSED [0.6841s] [ 51%]
2025-12-04T14:00:07.9239751Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_float64 PASSED [0.6462s] [ 51%]
2025-12-04T14:00:07.9240189Z test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_complex128 SKIPPED [0.0015s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9240610Z test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_float64 SKIPPED [0.0013s] (Only runs on cpu) [ 51%]
2025-12-04T14:00:07.9240852Z test_sparse.py::TestSparseCUDA::test_is_nonzero_cuda PASSED [0.0034s]    [ 51%]
2025-12-04T14:00:07.9241093Z test_sparse.py::TestSparseCUDA::test_is_sparse_cuda PASSED [0.0014s]     [ 52%]
2025-12-04T14:00:07.9241331Z test_sparse.py::TestSparseCUDA::test_isnan_cuda PASSED [0.0036s]         [ 52%]
2025-12-04T14:00:07.9241568Z test_sparse.py::TestSparseCUDA::test_legacy_new_cuda PASSED [0.0019s]    [ 52%]
2025-12-04T14:00:07.9241951Z test_sparse.py::TestSparseCUDA::test_legacy_new_device_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 52%]
2025-12-04T14:00:07.9242187Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_float32 PASSED [0.0053s] [ 52%]
2025-12-04T14:00:07.9242465Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_float64 PASSED [0.0051s] [ 52%]
2025-12-04T14:00:07.9242708Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int16 PASSED [0.0046s]   [ 52%]
2025-12-04T14:00:07.9242939Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int32 PASSED [0.0050s]   [ 52%]
2025-12-04T14:00:07.9243179Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int64 PASSED [0.0045s]   [ 52%]
2025-12-04T14:00:07.9243410Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int8 PASSED [0.0045s]    [ 52%]
2025-12-04T14:00:07.9243644Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_uint8 PASSED [0.0045s]   [ 52%]
2025-12-04T14:00:07.9243930Z test_sparse.py::TestSparseCUDA::test_log_softmax_float_cuda_float32 PASSED [0.0056s] [ 52%]
2025-12-04T14:00:07.9244224Z test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float32 PASSED [0.0037s] [ 52%]
2025-12-04T14:00:07.9244513Z test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float64 PASSED [0.0064s] [ 52%]
2025-12-04T14:00:07.9244769Z test_sparse.py::TestSparseCUDA::test_mm_cuda_complex128 PASSED [0.1149s] [ 52%]
2025-12-04T14:00:07.9245003Z test_sparse.py::TestSparseCUDA::test_mm_cuda_float64 PASSED [0.0382s]    [ 52%]
2025-12-04T14:00:07.9245239Z test_sparse.py::TestSparseCUDA::test_mv_cuda_float64 PASSED [0.0336s]    [ 52%]
2025-12-04T14:00:07.9245497Z test_sparse.py::TestSparseCUDA::test_narrow_cuda_complex128 PASSED [0.0559s] [ 52%]
2025-12-04T14:00:07.9245735Z test_sparse.py::TestSparseCUDA::test_narrow_cuda_float64 PASSED [0.0529s] [ 52%]
2025-12-04T14:00:07.9246019Z test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_complex128 PASSED [0.0143s] [ 52%]
2025-12-04T14:00:07.9246278Z test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_float64 PASSED [0.0134s] [ 52%]
2025-12-04T14:00:07.9246527Z test_sparse.py::TestSparseCUDA::test_negative_indices_cuda PASSED [0.0017s] [ 52%]
2025-12-04T14:00:07.9246773Z test_sparse.py::TestSparseCUDA::test_new_cuda_complex128 PASSED [0.0051s] [ 52%]
2025-12-04T14:00:07.9247009Z test_sparse.py::TestSparseCUDA::test_new_cuda_float64 PASSED [0.0049s]   [ 52%]
2025-12-04T14:00:07.9247380Z test_sparse.py::TestSparseCUDA::test_new_device_multi_gpu_cuda SKIPPED [0.0002s] (only one GPU detected) [ 52%]
2025-12-04T14:00:07.9247641Z test_sparse.py::TestSparseCUDA::test_new_device_single_gpu_cuda PASSED [0.0019s] [ 52%]
2025-12-04T14:00:07.9247932Z test_sparse.py::TestSparseCUDA::test_norm_cuda_complex128 PASSED [0.0321s] [ 52%]
2025-12-04T14:00:07.9248170Z test_sparse.py::TestSparseCUDA::test_norm_cuda_float64 PASSED [0.0119s]  [ 52%]
2025-12-04T14:00:07.9248494Z test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_complex128 PASSED [3.5504s] [ 52%]
2025-12-04T14:00:07.9248780Z test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_float64 PASSED [1.5578s] [ 52%]
2025-12-04T14:00:07.9249107Z test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_complex128 PASSED [3.9389s] [ 52%]
2025-12-04T14:00:07.9249374Z test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_float64 PASSED [1.5074s] [ 53%]
2025-12-04T14:00:07.9249623Z test_sparse.py::TestSparseCUDA::test_pickle_cuda_float64 PASSED [0.0168s] [ 53%]
2025-12-04T14:00:07.9249896Z test_sparse.py::TestSparseCUDA::test_print_coalesced_cuda_float64 PASSED [0.0153s] [ 53%]
2025-12-04T14:00:07.9250180Z test_sparse.py::TestSparseCUDA::test_print_uncoalesced_cuda_float64 PASSED [0.0146s] [ 53%]
2025-12-04T14:00:07.9250423Z test_sparse.py::TestSparseCUDA::test_resize_as_cuda PASSED [0.0022s]     [ 53%]
2025-12-04T14:00:07.9250676Z test_sparse.py::TestSparseCUDA::test_resize_cuda_complex128 PASSED [0.0093s] [ 53%]
2025-12-04T14:00:07.9250919Z test_sparse.py::TestSparseCUDA::test_resize_cuda_float64 PASSED [0.0087s] [ 53%]
2025-12-04T14:00:07.9251290Z test_sparse.py::TestSparseCUDA::test_saddmm_cuda_complex128 SKIPPED [0.0013s] (Only runs on cpu) [ 53%]
2025-12-04T14:00:07.9251608Z test_sparse.py::TestSparseCUDA::test_saddmm_cuda_float64 SKIPPED [0.0018s] (Only runs on cpu) [ 53%]
2025-12-04T14:00:07.9252003Z test_sparse.py::TestSparseCUDA::test_same_gpu_cuda SKIPPED [0.0012s] (fewer than 2 devices detected) [ 53%]
2025-12-04T14:00:07.9252255Z test_sparse.py::TestSparseCUDA::test_scalar_cuda_complex128 PASSED [0.0051s] [ 53%]
2025-12-04T14:00:07.9252501Z test_sparse.py::TestSparseCUDA::test_scalar_cuda_float64 PASSED [0.0047s] [ 53%]
2025-12-04T14:00:07.9252753Z test_sparse.py::TestSparseCUDA::test_select_cuda_complex128 PASSED [0.1155s] [ 53%]
2025-12-04T14:00:07.9252996Z test_sparse.py::TestSparseCUDA::test_select_cuda_float64 PASSED [0.1106s] [ 53%]
2025-12-04T14:00:07.9253302Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int16 PASSED [0.0031s] [ 53%]
2025-12-04T14:00:07.9253608Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int32 PASSED [0.0022s] [ 53%]
2025-12-04T14:00:07.9253910Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int64 PASSED [0.0021s] [ 53%]
2025-12-04T14:00:07.9254204Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int8 PASSED [0.0021s] [ 53%]
2025-12-04T14:00:07.9254501Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_uint8 PASSED [0.0021s] [ 53%]
2025-12-04T14:00:07.9254765Z test_sparse.py::TestSparseCUDA::test_shared_cuda_complex128 PASSED [0.0028s] [ 53%]
2025-12-04T14:00:07.9255004Z test_sparse.py::TestSparseCUDA::test_shared_cuda_float64 PASSED [0.0026s] [ 53%]
2025-12-04T14:00:07.9255264Z test_sparse.py::TestSparseCUDA::test_small_nnz_coalesced_cuda PASSED [0.0021s] [ 53%]
2025-12-04T14:00:07.9255511Z test_sparse.py::TestSparseCUDA::test_softmax_cuda_float64 PASSED [1.4314s] [ 53%]
2025-12-04T14:00:07.9255785Z test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float32 PASSED [0.0034s] [ 53%]
2025-12-04T14:00:07.9256069Z test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float64 PASSED [0.0057s] [ 53%]
2025-12-04T14:00:07.9256306Z test_sparse.py::TestSparseCUDA::test_spadd_cuda_float64 PASSED [0.1069s] [ 53%]
2025-12-04T14:00:07.9256614Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex128 PASSED [0.0025s] [ 53%]
2025-12-04T14:00:07.9256923Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex64 PASSED [0.0029s] [ 53%]
2025-12-04T14:00:07.9257213Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float32 PASSED [0.0023s] [ 53%]
2025-12-04T14:00:07.9257554Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float64 PASSED [0.0023s] [ 53%]
2025-12-04T14:00:07.9257856Z test_sparse.py::TestSparseCUDA::test_sparse_add_out_bfloat16_cuda_float32 PASSED [0.0053s] [ 53%]
2025-12-04T14:00:07.9258412Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_bfloat16 SKIPPED [0.0013s] (addmm_sparse_cuda is not implemented for BFloat16 and Half) [ 54%]
2025-12-04T14:00:07.9258748Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_complex128 PASSED [7.2898s] [ 54%]
2025-12-04T14:00:07.9259309Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float16 SKIPPED [0.0016s] (addmm_sparse_cuda is not implemented for BFloat16 and Half) [ 54%]
2025-12-04T14:00:07.9259578Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float64 PASSED [2.5909s] [ 54%]
2025-12-04T14:00:07.9259845Z test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_complex128 PASSED [0.0021s] [ 54%]
2025-12-04T14:00:07.9260100Z test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_float64 PASSED [0.0018s] [ 54%]
2025-12-04T14:00:07.9260412Z test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_complex128 PASSED [0.0145s] [ 54%]
2025-12-04T14:00:07.9260707Z test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_float64 PASSED [0.0141s] [ 54%]
2025-12-04T14:00:07.9261314Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bfloat16 SKIPPED [0.0944s] (Test with dtype=torch.bfloat16, device=cuda:0 runs only with coalesced inputs) [ 54%]
2025-12-04T14:00:07.9261948Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bool SKIPPED [0.0058s] (Test with dtype=torch.bool, device=cuda:0 runs only with coalesced inputs) [ 54%]
2025-12-04T14:00:07.9262288Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex128 PASSED [0.2186s] [ 54%]
2025-12-04T14:00:07.9262579Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex64 PASSED [0.2173s] [ 54%]
2025-12-04T14:00:07.9263167Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float16 SKIPPED [0.0943s] (Test with dtype=torch.float16, device=cuda:0 runs only with coalesced inputs) [ 54%]
2025-12-04T14:00:07.9263448Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float32 PASSED [0.2047s] [ 54%]
2025-12-04T14:00:07.9263728Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float64 PASSED [0.2066s] [ 54%]
2025-12-04T14:00:07.9263992Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int16 PASSED [0.1558s] [ 54%]
2025-12-04T14:00:07.9264259Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int32 PASSED [0.1561s] [ 54%]
2025-12-04T14:00:07.9264524Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int64 PASSED [0.1554s] [ 54%]
2025-12-04T14:00:07.9264786Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int8 PASSED [0.1563s] [ 54%]
2025-12-04T14:00:07.9265050Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_uint8 PASSED [0.1555s] [ 54%]
2025-12-04T14:00:07.9265357Z test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128 PASSED [3.7690s] [ 54%]
2025-12-04T14:00:07.9265657Z test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_float64 PASSED [1.4600s] [ 54%]
2025-12-04T14:00:07.9265922Z test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_complex128 PASSED [0.0578s] [ 54%]
2025-12-04T14:00:07.9266180Z test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_float64 PASSED [0.0548s] [ 54%]
2025-12-04T14:00:07.9266487Z test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_complex128 PASSED [0.0629s] [ 54%]
2025-12-04T14:00:07.9266774Z test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_float64 PASSED [0.0613s] [ 54%]
2025-12-04T14:00:07.9267044Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_bfloat16 PASSED [0.8084s] [ 54%]
2025-12-04T14:00:07.9267329Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex128 PASSED [46.8235s] [ 54%]
2025-12-04T14:00:07.9267600Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex64 PASSED [0.6948s] [ 54%]
2025-12-04T14:00:07.9267864Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float16 PASSED [0.8018s] [ 54%]
2025-12-04T14:00:07.9268169Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float32 PASSED [0.7047s] [ 54%]
2025-12-04T14:00:07.9268485Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float64 PASSED [18.6624s] [ 55%]
2025-12-04T14:00:07.9268766Z test_sparse.py::TestSparseCUDA::test_sparse_mm_cuda_float64 PASSED [0.7153s] [ 55%]
2025-12-04T14:00:07.9269152Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.2343s] [ 55%]
2025-12-04T14:00:07.9269514Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.4105s] [ 55%]
2025-12-04T14:00:07.9269794Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 FAILED [0.3886s] [ 55%]
2025-12-04T14:00:07.9269801Z 
2025-12-04T14:00:07.9273680Z ==================================== RERUNS ====================================
2025-12-04T14:00:07.9273936Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9274048Z Traceback (most recent call last):
2025-12-04T14:00:07.9274346Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9274443Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9274701Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9274998Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9275239Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9275372Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9275884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9276043Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9276446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9276551Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9276983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9277076Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9277523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9277603Z     gradcheck_fn(
2025-12-04T14:00:07.9278024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9278121Z     raise GradcheckError(
2025-12-04T14:00:07.9278482Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9278601Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9278685Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9278771Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9278842Z         ...,
2025-12-04T14:00:07.9278920Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9279006Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9279152Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9279353Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9279490Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9279613Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9279685Z         ...,
2025-12-04T14:00:07.9279809Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9279928Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9280089Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9280202Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9280257Z 
2025-12-04T14:00:07.9280261Z 
2025-12-04T14:00:07.9280446Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9281018Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9281024Z 
2025-12-04T14:00:07.9281253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9281478Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9281579Z Traceback (most recent call last):
2025-12-04T14:00:07.9281859Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9281955Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9282209Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9282349Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9282585Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9282722Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9283175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9283332Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9283771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9283874Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9284348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9284439Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9284886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9284964Z     gradcheck_fn(
2025-12-04T14:00:07.9285387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9285475Z     raise GradcheckError(
2025-12-04T14:00:07.9285837Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9285956Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9286039Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9286128Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9286198Z         ...,
2025-12-04T14:00:07.9286277Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9286363Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9286501Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9286698Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9286831Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9286955Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9287030Z         ...,
2025-12-04T14:00:07.9287155Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9287275Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9287407Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9287515Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9287520Z 
2025-12-04T14:00:07.9287524Z 
2025-12-04T14:00:07.9287706Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9288234Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9288239Z 
2025-12-04T14:00:07.9288458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9288650Z =================================== FAILURES ===================================
2025-12-04T14:00:07.9288893Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9289035Z Traceback (most recent call last):
2025-12-04T14:00:07.9289317Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9289404Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9289659Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9289791Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9290025Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9290157Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9290601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9290758Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9291150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9291251Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9291686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9291844Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9292293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9292413Z     gradcheck_fn(
2025-12-04T14:00:07.9292830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9292927Z     raise GradcheckError(
2025-12-04T14:00:07.9293280Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9293403Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9293489Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9293569Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9293637Z         ...,
2025-12-04T14:00:07.9293718Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9293794Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9293935Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9294136Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9294260Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9294386Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9294457Z         ...,
2025-12-04T14:00:07.9294582Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9294707Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9294837Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9294939Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9294944Z 
2025-12-04T14:00:07.9294956Z 
2025-12-04T14:00:07.9295137Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9295661Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9295666Z 
2025-12-04T14:00:07.9295887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9296380Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml -
2025-12-04T14:00:07.9296520Z =========================== short test summary info ============================
2025-12-04T14:00:07.9297226Z FAILED [0.3886s] test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9297391Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9297537Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9297619Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9297688Z         ...,
2025-12-04T14:00:07.9297766Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9297846Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9297981Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9298179Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9298307Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9298431Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9298501Z         ...,
2025-12-04T14:00:07.9298628Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9298751Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9298904Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9299096Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9299101Z 
2025-12-04T14:00:07.9299109Z 
2025-12-04T14:00:07.9299339Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9299864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9299907Z 
2025-12-04T14:00:07.9300131Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9300276Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T14:00:07.9300470Z ======= 1 failed, 1503 passed, 203 skipped, 2 rerun in 228.71s (0:03:48) =======
2025-12-04T14:00:07.9300552Z Got exit code 1
2025-12-04T14:00:07.9300638Z Retrying single test...
2025-12-04T14:00:07.9300984Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml
2025-12-04T14:00:07.9301117Z ============================= test session starts ==============================
2025-12-04T14:00:07.9301409Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.9301499Z cachedir: .pytest_cache
2025-12-04T14:00:07.9301945Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.9302047Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.9302136Z configfile: pytest.ini
2025-12-04T14:00:07.9302595Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.9302788Z collecting ... collected 3100 items / 3099 deselected / 1 selected
2025-12-04T14:00:07.9303259Z stepcurrent: skipping 1706 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9303352Z Running 1 items in this shard
2025-12-04T14:00:07.9303356Z 
2025-12-04T14:00:07.9303721Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.4337s] [100%]
2025-12-04T14:00:07.9304081Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.3819s] [100%]
2025-12-04T14:00:07.9304367Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 FAILED [0.3789s] [100%]
2025-12-04T14:00:07.9304374Z 
2025-12-04T14:00:07.9304486Z ==================================== RERUNS ====================================
2025-12-04T14:00:07.9304704Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9304804Z Traceback (most recent call last):
2025-12-04T14:00:07.9305136Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9305225Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9305525Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9305662Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9305903Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9306033Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9306482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9306643Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9307035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9307135Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9307568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9307655Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9308494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9308577Z     gradcheck_fn(
2025-12-04T14:00:07.9309135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9313287Z     raise GradcheckError(
2025-12-04T14:00:07.9313669Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9313955Z numerical:tensor([[0.6700, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9314092Z         [0.0000, 0.5920, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9314216Z         [0.0000, 0.0000, 0.1134,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9314294Z         ...,
2025-12-04T14:00:07.9314418Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9314538Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9314670Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]],
2025-12-04T14:00:07.9314785Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9314993Z analytical:tensor([[0.6700, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9315116Z         [0.0000, 0.5920, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9315257Z         [0.0000, 0.0000, 0.1134,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9315334Z         ...,
2025-12-04T14:00:07.9315456Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9315580Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9315708Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]],
2025-12-04T14:00:07.9315824Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9315829Z 
2025-12-04T14:00:07.9315833Z 
2025-12-04T14:00:07.9316019Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9316587Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9316592Z 
2025-12-04T14:00:07.9316831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9317078Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9317187Z Traceback (most recent call last):
2025-12-04T14:00:07.9317489Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9317582Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9317839Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9317974Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9318279Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9318410Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9318964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9319126Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9319519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9319623Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9320062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9320151Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9320599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9320680Z     gradcheck_fn(
2025-12-04T14:00:07.9321099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9321194Z     raise GradcheckError(
2025-12-04T14:00:07.9321553Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9321672Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9321753Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9321926Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9321998Z         ...,
2025-12-04T14:00:07.9322119Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9322198Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9322340Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9322539Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9322673Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9322799Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9322869Z         ...,
2025-12-04T14:00:07.9322996Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9323116Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9323244Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9323353Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9323358Z 
2025-12-04T14:00:07.9323362Z 
2025-12-04T14:00:07.9323544Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9324069Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9324073Z 
2025-12-04T14:00:07.9324295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9324412Z =================================== FAILURES ===================================
2025-12-04T14:00:07.9324634Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9324732Z Traceback (most recent call last):
2025-12-04T14:00:07.9325010Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9325098Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9325349Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9325486Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9325722Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9325851Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9326296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9326502Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9326897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9327043Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9327477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9327568Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9328017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9328099Z     gradcheck_fn(
2025-12-04T14:00:07.9328527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9328617Z     raise GradcheckError(
2025-12-04T14:00:07.9328970Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9329089Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9329169Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9329248Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9329320Z         ...,
2025-12-04T14:00:07.9329399Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9329478Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9329614Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9329867Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9330067Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9330188Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9330259Z         ...,
2025-12-04T14:00:07.9330380Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9330503Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9330631Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9330733Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9330741Z 
2025-12-04T14:00:07.9330745Z 
2025-12-04T14:00:07.9330922Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9331450Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9331458Z 
2025-12-04T14:00:07.9331680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9332175Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml -
2025-12-04T14:00:07.9332314Z =========================== short test summary info ============================
2025-12-04T14:00:07.9333017Z FAILED [0.3789s] test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9333137Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9333220Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9333300Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9333370Z         ...,
2025-12-04T14:00:07.9333445Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9333527Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9333669Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9333867Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9333994Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9334115Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9334185Z         ...,
2025-12-04T14:00:07.9334356Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9334475Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9334644Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9334751Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9334756Z 
2025-12-04T14:00:07.9334760Z 
2025-12-04T14:00:07.9334940Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9335462Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9335470Z 
2025-12-04T14:00:07.9335690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9335841Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T14:00:07.9336007Z ================= 1 failed, 3099 deselected, 2 rerun in 1.38s ==================
2025-12-04T14:00:07.9336085Z Got exit code 1
2025-12-04T14:00:07.9336175Z Retrying single test...
2025-12-04T14:00:07.9336519Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml
2025-12-04T14:00:07.9336657Z ============================= test session starts ==============================
2025-12-04T14:00:07.9336948Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.9337034Z cachedir: .pytest_cache
2025-12-04T14:00:07.9337531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.9337670Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.9337755Z configfile: pytest.ini
2025-12-04T14:00:07.9338216Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.9338407Z collecting ... collected 3100 items / 3099 deselected / 1 selected
2025-12-04T14:00:07.9338903Z stepcurrent: skipping 1706 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9339075Z Running 1 items in this shard
2025-12-04T14:00:07.9339081Z 
2025-12-04T14:00:07.9339445Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.4380s] [100%]
2025-12-04T14:00:07.9339810Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.3848s] [100%]
2025-12-04T14:00:07.9340092Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 FAILED [0.3816s] [100%]
2025-12-04T14:00:07.9340100Z 
2025-12-04T14:00:07.9340214Z ==================================== RERUNS ====================================
2025-12-04T14:00:07.9340435Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9340533Z Traceback (most recent call last):
2025-12-04T14:00:07.9340814Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9340898Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9341153Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9341291Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9341524Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9341658Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9342110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9342268Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9342663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9342760Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9343245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9343332Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9343820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9343902Z     gradcheck_fn(
2025-12-04T14:00:07.9344319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9344410Z     raise GradcheckError(
2025-12-04T14:00:07.9344767Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9344957Z numerical:tensor([[0.6700, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9345078Z         [0.0000, 0.5920, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9345193Z         [0.0000, 0.0000, 0.1134,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9345266Z         ...,
2025-12-04T14:00:07.9345382Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9345493Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9345613Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]],
2025-12-04T14:00:07.9345722Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9345910Z analytical:tensor([[0.6700, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9346076Z         [0.0000, 0.5920, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9346188Z         [0.0000, 0.0000, 0.1134,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9346298Z         ...,
2025-12-04T14:00:07.9346410Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9346519Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
2025-12-04T14:00:07.9346637Z         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]],
2025-12-04T14:00:07.9346748Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9346752Z 
2025-12-04T14:00:07.9346756Z 
2025-12-04T14:00:07.9346930Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9347460Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9347465Z 
2025-12-04T14:00:07.9347685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9347911Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9348011Z Traceback (most recent call last):
2025-12-04T14:00:07.9348290Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9348374Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9348629Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9348789Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9349048Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9349177Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9349625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9349781Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9350177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9350278Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9350711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9350798Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9351247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9351374Z     gradcheck_fn(
2025-12-04T14:00:07.9351792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9351926Z     raise GradcheckError(
2025-12-04T14:00:07.9352281Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9352401Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9352481Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9352561Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9352631Z         ...,
2025-12-04T14:00:07.9352709Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9352792Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9352938Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9353139Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9353272Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9353400Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9353475Z         ...,
2025-12-04T14:00:07.9353607Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9353733Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9353859Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9354017Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9354022Z 
2025-12-04T14:00:07.9354063Z 
2025-12-04T14:00:07.9354241Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9354774Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9354779Z 
2025-12-04T14:00:07.9355008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9355126Z =================================== FAILURES ===================================
2025-12-04T14:00:07.9355355Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________
2025-12-04T14:00:07.9355454Z Traceback (most recent call last):
2025-12-04T14:00:07.9355734Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9355820Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9356074Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9356212Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9356447Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9356578Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9357027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9357184Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9357588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9357685Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9358117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9358210Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9358663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9358745Z     gradcheck_fn(
2025-12-04T14:00:07.9359173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9359262Z     raise GradcheckError(
2025-12-04T14:00:07.9359628Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9359795Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9359879Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9360002Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9360074Z         ...,
2025-12-04T14:00:07.9360154Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9360238Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9360375Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9360584Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9360720Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9360842Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9360914Z         ...,
2025-12-04T14:00:07.9361037Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9361160Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9361290Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9361397Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9361402Z 
2025-12-04T14:00:07.9361406Z 
2025-12-04T14:00:07.9361585Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9362186Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9362191Z 
2025-12-04T14:00:07.9362416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9362949Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml -
2025-12-04T14:00:07.9363090Z =========================== short test summary info ============================
2025-12-04T14:00:07.9363792Z FAILED [0.3816s] test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9363913Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9363995Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9364077Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9364147Z         ...,
2025-12-04T14:00:07.9364224Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9364307Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9364446Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9364652Z analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9364775Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9364898Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9364976Z         ...,
2025-12-04T14:00:07.9365098Z         [ 0.0000,  0.0000,  0.0000,  ...,  1.0153,  0.0000,  0.0000],
2025-12-04T14:00:07.9365218Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.8483,  0.0000],
2025-12-04T14:00:07.9365354Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000, -0.0211]],
2025-12-04T14:00:07.9365459Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9365463Z 
2025-12-04T14:00:07.9365467Z 
2025-12-04T14:00:07.9365648Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9366172Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9366179Z 
2025-12-04T14:00:07.9366400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9366549Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T14:00:07.9366715Z ================= 1 failed, 3099 deselected, 2 rerun in 1.39s ==================
2025-12-04T14:00:07.9366841Z Got exit code 1
2025-12-04T14:00:07.9367152Z FAILED CONSISTENTLY: test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64
2025-12-04T14:00:07.9367545Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T14:00:07.9367888Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml
2025-12-04T14:00:07.9368027Z ============================= test session starts ==============================
2025-12-04T14:00:07.9368320Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.9368412Z cachedir: .pytest_cache
2025-12-04T14:00:07.9368884Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.9369010Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.9369103Z configfile: pytest.ini
2025-12-04T14:00:07.9369560Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.9369761Z collecting ... collected 3100 items / 1707 deselected / 1393 selected
2025-12-04T14:00:07.9369884Z stepcurrent: skipping 1707 already run items.
2025-12-04T14:00:07.9369977Z Running 1393 items in this shard
2025-12-04T14:00:07.9369986Z 
2025-12-04T14:00:07.9370400Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.5012s] [  0%]
2025-12-04T14:00:07.9370762Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4495s] [  0%]
2025-12-04T14:00:07.9371087Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 FAILED [0.4447s] [  0%]
2025-12-04T14:00:07.9371092Z 
2025-12-04T14:00:07.9371203Z ==================================== RERUNS ====================================
2025-12-04T14:00:07.9371426Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9371532Z Traceback (most recent call last):
2025-12-04T14:00:07.9371815Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9371909Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9372162Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9372297Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9372543Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9372676Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9373131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9373294Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9373686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9373796Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9374228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9374316Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9374766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9374846Z     gradcheck_fn(
2025-12-04T14:00:07.9375268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9375366Z     raise GradcheckError(
2025-12-04T14:00:07.9375722Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9375924Z numerical:tensor([[ 0.9997,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9376107Z         [ 0.0000, -0.8658,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9376233Z         [ 0.0000,  0.0000, -0.9013,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9376314Z         ...,
2025-12-04T14:00:07.9376480Z         [ 0.0000,  0.0000,  0.0000,  ..., -0.5610,  0.0000,  0.0000],
2025-12-04T14:00:07.9376604Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.9928,  0.0000],
2025-12-04T14:00:07.9376732Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.2065]],
2025-12-04T14:00:07.9376838Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9377041Z analytical:tensor([[ 0.9997,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9377167Z         [ 0.0000, -0.8658,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9377292Z         [ 0.0000,  0.0000, -0.9013,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9377370Z         ...,
2025-12-04T14:00:07.9377489Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9377618Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9377747Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],
2025-12-04T14:00:07.9377856Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9377860Z 
2025-12-04T14:00:07.9377864Z 
2025-12-04T14:00:07.9378045Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9378627Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9378634Z 
2025-12-04T14:00:07.9378931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9379230Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9379330Z Traceback (most recent call last):
2025-12-04T14:00:07.9379610Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9379700Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9379951Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9380092Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9380325Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9380455Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9380909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9381066Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9381462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9381562Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9381993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9382086Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9382535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9382620Z     gradcheck_fn(
2025-12-04T14:00:07.9383036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9383125Z     raise GradcheckError(
2025-12-04T14:00:07.9383484Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9383602Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9383684Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9383767Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9383839Z         ...,
2025-12-04T14:00:07.9383920Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384001Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384189Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9384313Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384431Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384507Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384585Z         ...,
2025-12-04T14:00:07.9384664Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384740Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9384879Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9384884Z 
2025-12-04T14:00:07.9384890Z 
2025-12-04T14:00:07.9385070Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9385595Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9385599Z 
2025-12-04T14:00:07.9385822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9385937Z =================================== FAILURES ===================================
2025-12-04T14:00:07.9386165Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9386264Z Traceback (most recent call last):
2025-12-04T14:00:07.9386543Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9386631Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9386931Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9387069Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9387345Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9387480Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9387929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9388089Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9388485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9388584Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9389017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9389106Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9389553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9389632Z     gradcheck_fn(
2025-12-04T14:00:07.9390052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9390144Z     raise GradcheckError(
2025-12-04T14:00:07.9390504Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9390626Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9390708Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9390796Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9390867Z         ...,
2025-12-04T14:00:07.9390944Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391032Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391168Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9391293Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391375Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391457Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391532Z         ...,
2025-12-04T14:00:07.9391609Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391687Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9391825Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9391877Z 
2025-12-04T14:00:07.9391881Z 
2025-12-04T14:00:07.9392061Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9392657Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9392663Z 
2025-12-04T14:00:07.9392887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9393375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml -
2025-12-04T14:00:07.9393524Z =========================== short test summary info ============================
2025-12-04T14:00:07.9394219Z FAILED [0.4447s] test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9394346Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9394422Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9394500Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9394574Z         ...,
2025-12-04T14:00:07.9394654Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9394730Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9394868Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9394990Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9395116Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9395197Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9395307Z         ...,
2025-12-04T14:00:07.9395389Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9395467Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9395601Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9395608Z 
2025-12-04T14:00:07.9395612Z 
2025-12-04T14:00:07.9395796Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9396319Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9396323Z 
2025-12-04T14:00:07.9396555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9396702Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T14:00:07.9396871Z ================= 1 failed, 1707 deselected, 2 rerun in 1.59s ==================
2025-12-04T14:00:07.9396962Z Got exit code 1
2025-12-04T14:00:07.9397049Z Retrying single test...
2025-12-04T14:00:07.9397390Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml
2025-12-04T14:00:07.9397526Z ============================= test session starts ==============================
2025-12-04T14:00:07.9397819Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.9397909Z cachedir: .pytest_cache
2025-12-04T14:00:07.9398355Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.9398459Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.9398555Z configfile: pytest.ini
2025-12-04T14:00:07.9399064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.9399259Z collecting ... collected 3100 items / 3099 deselected / 1 selected
2025-12-04T14:00:07.9399731Z stepcurrent: skipping 1707 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9399823Z Running 1 items in this shard
2025-12-04T14:00:07.9399828Z 
2025-12-04T14:00:07.9400194Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4911s] [100%]
2025-12-04T14:00:07.9400601Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4435s] [100%]
2025-12-04T14:00:07.9400930Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 FAILED [0.4391s] [100%]
2025-12-04T14:00:07.9400935Z 
2025-12-04T14:00:07.9401052Z ==================================== RERUNS ====================================
2025-12-04T14:00:07.9401274Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9401380Z Traceback (most recent call last):
2025-12-04T14:00:07.9401661Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9401750Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9402011Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9402146Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9402383Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9402515Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9402968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9403134Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9403527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9403683Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9404119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9404249Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9404698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9404785Z     gradcheck_fn(
2025-12-04T14:00:07.9405203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9405295Z     raise GradcheckError(
2025-12-04T14:00:07.9405651Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9405851Z numerical:tensor([[ 0.9997,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9405985Z         [ 0.0000, -0.8658,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9406115Z         [ 0.0000,  0.0000, -0.9013,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9406187Z         ...,
2025-12-04T14:00:07.9406316Z         [ 0.0000,  0.0000,  0.0000,  ..., -0.5610,  0.0000,  0.0000],
2025-12-04T14:00:07.9406435Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.9928,  0.0000],
2025-12-04T14:00:07.9406563Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.2065]],
2025-12-04T14:00:07.9406673Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9406870Z analytical:tensor([[ 0.9997,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9406997Z         [ 0.0000, -0.8658,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9407122Z         [ 0.0000,  0.0000, -0.9013,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9407197Z         ...,
2025-12-04T14:00:07.9407320Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9407439Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9407572Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],
2025-12-04T14:00:07.9407681Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9407686Z 
2025-12-04T14:00:07.9407690Z 
2025-12-04T14:00:07.9408044Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9408597Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9408775Z 
2025-12-04T14:00:07.9408997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9409281Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9409384Z Traceback (most recent call last):
2025-12-04T14:00:07.9409661Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9409750Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9410003Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9410141Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9410381Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9410509Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9410958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9411118Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9411516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9411621Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9412054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9412147Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9412664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9412803Z     gradcheck_fn(
2025-12-04T14:00:07.9413225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9413313Z     raise GradcheckError(
2025-12-04T14:00:07.9413669Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9413793Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9413874Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9413958Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414034Z         ...,
2025-12-04T14:00:07.9414111Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414192Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414330Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9414454Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414536Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414615Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414688Z         ...,
2025-12-04T14:00:07.9414774Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414849Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9414986Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9414995Z 
2025-12-04T14:00:07.9414999Z 
2025-12-04T14:00:07.9415181Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9415708Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9415713Z 
2025-12-04T14:00:07.9415940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9416057Z =================================== FAILURES ===================================
2025-12-04T14:00:07.9416286Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9416385Z Traceback (most recent call last):
2025-12-04T14:00:07.9416662Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9416757Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9417058Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9417192Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9417468Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9417600Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9418050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9418211Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9418603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9418708Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9419189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9419278Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9419731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9419811Z     gradcheck_fn(
2025-12-04T14:00:07.9420234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9420324Z     raise GradcheckError(
2025-12-04T14:00:07.9420678Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9420852Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9420933Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421052Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421129Z         ...,
2025-12-04T14:00:07.9421208Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421283Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421428Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9421554Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421635Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421716Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421790Z         ...,
2025-12-04T14:00:07.9421872Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9421949Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9422081Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9422085Z 
2025-12-04T14:00:07.9422089Z 
2025-12-04T14:00:07.9422275Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9422803Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9422808Z 
2025-12-04T14:00:07.9423034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9423522Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml -
2025-12-04T14:00:07.9423664Z =========================== short test summary info ============================
2025-12-04T14:00:07.9424370Z FAILED [0.4391s] test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9424487Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9424569Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9424650Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9424725Z         ...,
2025-12-04T14:00:07.9424805Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9424888Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9425022Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9425146Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9425298Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9425381Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9425454Z         ...,
2025-12-04T14:00:07.9425531Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9425652Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9425785Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9425789Z 
2025-12-04T14:00:07.9425793Z 
2025-12-04T14:00:07.9432145Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9432721Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9432731Z 
2025-12-04T14:00:07.9432969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9433125Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T14:00:07.9433299Z ================= 1 failed, 3099 deselected, 2 rerun in 1.56s ==================
2025-12-04T14:00:07.9433383Z Got exit code 1
2025-12-04T14:00:07.9433475Z Retrying single test...
2025-12-04T14:00:07.9433826Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml
2025-12-04T14:00:07.9433963Z ============================= test session starts ==============================
2025-12-04T14:00:07.9434262Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.9434434Z cachedir: .pytest_cache
2025-12-04T14:00:07.9434887Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.9435036Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.9435126Z configfile: pytest.ini
2025-12-04T14:00:07.9435593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.9435789Z collecting ... collected 3100 items / 3099 deselected / 1 selected
2025-12-04T14:00:07.9436267Z stepcurrent: skipping 1707 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9436364Z Running 1 items in this shard
2025-12-04T14:00:07.9436369Z 
2025-12-04T14:00:07.9436734Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4995s] [100%]
2025-12-04T14:00:07.9437106Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4364s] [100%]
2025-12-04T14:00:07.9437398Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 FAILED [0.4332s] [100%]
2025-12-04T14:00:07.9437403Z 
2025-12-04T14:00:07.9437526Z ==================================== RERUNS ====================================
2025-12-04T14:00:07.9437755Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9437861Z Traceback (most recent call last):
2025-12-04T14:00:07.9438153Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9438248Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9438510Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9438652Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9438892Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9439038Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9439495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9439663Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9440069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9440223Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9440663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9440804Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9441257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9441344Z     gradcheck_fn(
2025-12-04T14:00:07.9441769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9441865Z     raise GradcheckError(
2025-12-04T14:00:07.9442236Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
2025-12-04T14:00:07.9442445Z numerical:tensor([[ 0.9997,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9442583Z         [ 0.0000, -0.8658,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9442722Z         [ 0.0000,  0.0000, -0.9013,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9442798Z         ...,
2025-12-04T14:00:07.9442933Z         [ 0.0000,  0.0000,  0.0000,  ..., -0.5610,  0.0000,  0.0000],
2025-12-04T14:00:07.9443065Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.9928,  0.0000],
2025-12-04T14:00:07.9443196Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.2065]],
2025-12-04T14:00:07.9443313Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9443563Z analytical:tensor([[ 0.9997,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9443696Z         [ 0.0000, -0.8658,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9443874Z         [ 0.0000,  0.0000, -0.9013,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9443951Z         ...,
2025-12-04T14:00:07.9444081Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9444208Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
2025-12-04T14:00:07.9444342Z         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],
2025-12-04T14:00:07.9444459Z        device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9444463Z 
2025-12-04T14:00:07.9444470Z 
2025-12-04T14:00:07.9444656Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9445196Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9445202Z 
2025-12-04T14:00:07.9445433Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9445666Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9445779Z Traceback (most recent call last):
2025-12-04T14:00:07.9446059Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9446151Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9446418Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9446559Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9446807Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9446945Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9447395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9447563Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9448054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9448215Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9448792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9448992Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9452794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9452897Z     gradcheck_fn(
2025-12-04T14:00:07.9453392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9453497Z     raise GradcheckError(
2025-12-04T14:00:07.9453857Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9453988Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454077Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454165Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454246Z         ...,
2025-12-04T14:00:07.9454329Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454410Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454559Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9454689Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454769Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454855Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9454932Z         ...,
2025-12-04T14:00:07.9455015Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9455100Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9455241Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9455247Z 
2025-12-04T14:00:07.9455304Z 
2025-12-04T14:00:07.9455493Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9456069Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9456075Z 
2025-12-04T14:00:07.9456301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9456426Z =================================== FAILURES ===================================
2025-12-04T14:00:07.9456651Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________
2025-12-04T14:00:07.9456762Z Traceback (most recent call last):
2025-12-04T14:00:07.9457045Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul
2025-12-04T14:00:07.9477845Z     test_shape(2, 3, [2, 3, 4, 5])
2025-12-04T14:00:07.9478122Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape
2025-12-04T14:00:07.9478264Z     gradcheck(lambda x, y: (x * y).to_dense(), [a, b])
2025-12-04T14:00:07.9478529Z   File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped
2025-12-04T14:00:07.9478693Z     return gradcheck_fn(fn, inputs, *args, **kwargs)
2025-12-04T14:00:07.9479146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck
2025-12-04T14:00:07.9479313Z     return torch.autograd.gradcheck(fn, inputs, **kwargs)
2025-12-04T14:00:07.9479710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck
2025-12-04T14:00:07.9479816Z     return _gradcheck_helper(**args)
2025-12-04T14:00:07.9480254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper
2025-12-04T14:00:07.9480347Z     _gradcheck_real_imag(
2025-12-04T14:00:07.9480801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag
2025-12-04T14:00:07.9480885Z     gradcheck_fn(
2025-12-04T14:00:07.9481307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck
2025-12-04T14:00:07.9481402Z     raise GradcheckError(
2025-12-04T14:00:07.9481761Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9481951Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482040Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482125Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482201Z         ...,
2025-12-04T14:00:07.9482326Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482409Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482555Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9482681Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482763Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482849Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9482927Z         ...,
2025-12-04T14:00:07.9483009Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9483092Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9483229Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9483237Z 
2025-12-04T14:00:07.9483241Z 
2025-12-04T14:00:07.9483422Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9483946Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9483951Z 
2025-12-04T14:00:07.9484173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9484667Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml -
2025-12-04T14:00:07.9484873Z =========================== short test summary info ============================
2025-12-04T14:00:07.9485651Z FAILED [0.4332s] test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
2025-12-04T14:00:07.9485772Z numerical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9485854Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9485935Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486003Z         ...,
2025-12-04T14:00:07.9486079Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486162Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486296Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9486420Z analytical:tensor([[0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486497Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486577Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486647Z         ...,
2025-12-04T14:00:07.9486728Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486805Z         [0., 0., 0.,  ..., 0., 0., 0.],
2025-12-04T14:00:07.9486942Z         [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64)
2025-12-04T14:00:07.9486947Z 
2025-12-04T14:00:07.9486951Z 
2025-12-04T14:00:07.9487132Z To execute this test, run the following from the base repo dir:
2025-12-04T14:00:07.9487666Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9487671Z 
2025-12-04T14:00:07.9487895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T14:00:07.9488056Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T14:00:07.9488223Z ================= 1 failed, 3099 deselected, 2 rerun in 1.56s ==================
2025-12-04T14:00:07.9488306Z Got exit code 1
2025-12-04T14:00:07.9488628Z FAILED CONSISTENTLY: test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64
2025-12-04T14:00:07.9488983Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T14:00:07.9489339Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml
2025-12-04T14:00:07.9489522Z ============================= test session starts ==============================
2025-12-04T14:00:07.9489815Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T14:00:07.9489953Z cachedir: .pytest_cache
2025-12-04T14:00:07.9490400Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T14:00:07.9490502Z rootdir: /var/lib/jenkins/workspace
2025-12-04T14:00:07.9490595Z configfile: pytest.ini
2025-12-04T14:00:07.9491061Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T14:00:07.9491271Z collecting ... collected 3100 items / 1708 deselected / 1392 selected
2025-12-04T14:00:07.9491397Z stepcurrent: skipping 1708 already run items.
2025-12-04T14:00:07.9491494Z Running 1392 items in this shard
2025-12-04T14:00:07.9491499Z 
2025-12-04T14:00:07.9492122Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_bfloat16 SKIPPED [0.1640s] (Test with dtype=torch.bfloat16, device=cuda:0 runs only with coalesced inputs) [  0%]
2025-12-04T14:00:07.9492437Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex128 PASSED [0.1365s] [  0%]
2025-12-04T14:00:07.9492740Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex64 PASSED [0.0583s] [  0%]
2025-12-04T14:00:07.9493381Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float16 SKIPPED [0.0237s] (Test with dtype=torch.float16, device=cuda:0 runs only with coalesced inputs) [  0%]
2025-12-04T14:00:07.9493672Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float32 PASSED [0.0545s] [  0%]
2025-12-04T14:00:07.9494005Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float64 PASSED [0.0546s] [  0%]
2025-12-04T14:00:07.9494277Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int16 PASSED [0.0518s] [  0%]
2025-12-04T14:00:07.9494550Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int32 PASSED [0.0472s] [  0%]
2025-12-04T14:00:07.9494832Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int64 PASSED [0.0471s] [  0%]
2025-12-04T14:00:07.9495108Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int8 PASSED [0.0470s] [  0%]
2025-12-04T14:00:07.9495389Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_uint8 PASSED [0.0468s] [  0%]
2025-12-04T14:00:07.9495731Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_bool SKIPPED [0.0013s] (Only runs on cpu) [  0%]
2025-12-04T14:00:07.9496104Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [  0%]
2025-12-04T14:00:07.9496472Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex64 SKIPPED [0.0014s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9496823Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9497180Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9497523Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int16 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9497867Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int32 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9498217Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int64 SKIPPED [0.0014s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9498561Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int8 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9498961Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_uint8 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9499270Z test_sparse.py::TestSparseCUDA::test_sparse_sum_cuda_float64 PASSED [1.5036s] [  1%]
2025-12-04T14:00:07.9499590Z test_sparse.py::TestSparseCUDA::test_sparse_to_numpy_cuda SKIPPED [0.0015s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9499989Z test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9500317Z test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [  1%]
2025-12-04T14:00:07.9500626Z test_sparse.py::TestSparseCUDA::test_storage_not_null_cuda PASSED [0.0017s] [  1%]
2025-12-04T14:00:07.9500867Z test_sparse.py::TestSparseCUDA::test_sum_cuda_bool PASSED [0.0151s]      [  1%]
2025-12-04T14:00:07.9501110Z test_sparse.py::TestSparseCUDA::test_sum_cuda_complex128 PASSED [0.0253s] [  2%]
2025-12-04T14:00:07.9501363Z test_sparse.py::TestSparseCUDA::test_sum_cuda_complex64 PASSED [0.0248s] [  2%]
2025-12-04T14:00:07.9501607Z test_sparse.py::TestSparseCUDA::test_sum_cuda_float32 PASSED [0.0326s]   [  2%]
2025-12-04T14:00:07.9501847Z test_sparse.py::TestSparseCUDA::test_sum_cuda_float64 PASSED [0.0226s]   [  2%]
2025-12-04T14:00:07.9502086Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int16 PASSED [0.0145s]     [  2%]
2025-12-04T14:00:07.9502322Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int32 PASSED [0.0143s]     [  2%]
2025-12-04T14:00:07.9502561Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int64 PASSED [0.0141s]     [  2%]
2025-12-04T14:00:07.9502798Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int8 PASSED [0.0142s]      [  2%]
2025-12-04T14:00:07.9503030Z test_sparse.py::TestSparseCUDA::test_sum_cuda_uint8 PASSED [0.0143s]     [  2%]
2025-12-04T14:00:07.9503293Z test_sparse.py::TestSparseCUDA::test_t_empty_cuda_complex128 PASSED [0.0025s] [  2%]
2025-12-04T14:00:07.9503579Z test_sparse.py::TestSparseCUDA::test_t_empty_cuda_float64 PASSED [0.0020s] [  2%]
2025-12-04T14:00:07.9503937Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_complex128 PASSED [0.0836s] [  2%]
2025-12-04T14:00:07.9504244Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_float64 PASSED [0.0388s] [  2%]
2025-12-04T14:00:07.9504560Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_complex128 PASSED [0.0999s] [  2%]
2025-12-04T14:00:07.9504867Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_float64 PASSED [0.0373s] [  3%]
2025-12-04T14:00:07.9505209Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_bfloat16 PASSED [0.0888s] [  3%]
2025-12-04T14:00:07.9505558Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex128 PASSED [0.0886s] [  3%]
2025-12-04T14:00:07.9505908Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex64 PASSED [0.0885s] [  3%]
2025-12-04T14:00:07.9506244Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float16 PASSED [0.0876s] [  3%]
2025-12-04T14:00:07.9506583Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float32 PASSED [0.0883s] [  3%]
2025-12-04T14:00:07.9506911Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float64 PASSED [0.1495s] [  3%]
2025-12-04T14:00:07.9507246Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_bfloat16 PASSED [0.0787s] [  3%]
2025-12-04T14:00:07.9507603Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex128 PASSED [0.0787s] [  3%]
2025-12-04T14:00:07.9508228Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex64 PASSED [0.0784s] [  3%]
2025-12-04T14:00:07.9508671Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float16 PASSED [0.0784s] [  3%]
2025-12-04T14:00:07.9509011Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float32 PASSED [0.0785s] [  3%]
2025-12-04T14:00:07.9509339Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float64 PASSED [0.1393s] [  3%]
2025-12-04T14:00:07.9509605Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_bfloat16 PASSED [0.0856s] [  3%]
2025-12-04T14:00:07.9509869Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex128 PASSED [0.0666s] [  4%]
2025-12-04T14:00:07.9510136Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex64 PASSED [0.0669s] [  4%]
2025-12-04T14:00:07.9510511Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float16 PASSED [0.0655s] [  4%]
2025-12-04T14:00:07.9510819Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float64 PASSED [0.0648s] [  4%]
2025-12-04T14:00:07.9511071Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_int32 PASSED [0.0528s] [  4%]
2025-12-04T14:00:07.9511336Z test_sparse.py::TestSparseCUDA::test_transpose_cuda_complex128 PASSED [0.0381s] [  4%]
2025-12-04T14:00:07.9511589Z test_sparse.py::TestSparseCUDA::test_transpose_cuda_float64 PASSED [0.0366s] [  4%]
2025-12-04T14:00:07.9511860Z test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_complex128 PASSED [0.0258s] [  4%]
2025-12-04T14:00:07.9512115Z test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_float64 PASSED [0.0245s] [  4%]
2025-12-04T14:00:07.9512368Z test_sparse.py::TestSparseCUDA::test_zeros_cuda_complex128 PASSED [0.2423s] [  4%]
2025-12-04T14:00:07.9512605Z test_sparse.py::TestSparseCUDA::test_zeros_cuda_float64 PASSED [0.2367s] [  4%]
2025-12-04T14:00:07.9512874Z test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_complex128 PASSED [0.2489s] [  4%]
2025-12-04T14:00:07.9513132Z test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_float64 PASSED [0.2487s] [  4%]
2025-12-04T14:00:07.9513505Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_fast_cuda PASSED [0.5590s] [  4%]
2025-12-04T14:00:07.9513886Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_slow_cuda PASSED [27.8082s] [  5%]
2025-12-04T14:00:07.9514333Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_fast_cuda PASSED [0.5185s] [  5%]
2025-12-04T14:00:07.9514777Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_slow_cuda PASSED [25.3798s] [  5%]
2025-12-04T14:00:07.9515152Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_fast_cuda PASSED [0.4612s] [  5%]
2025-12-04T14:00:07.9515530Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_slow_cuda PASSED [25.8149s] [  5%]
2025-12-04T14:00:07.9515918Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_fast_cuda PASSED [0.4917s] [  5%]
2025-12-04T14:00:07.9516309Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_slow_cuda PASSED [23.9239s] [  5%]
2025-12-04T14:00:07.9516681Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_fast_cuda PASSED [0.8813s] [  5%]
2025-12-04T14:00:07.9517061Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_slow_cuda PASSED [24.3449s] [  5%]
2025-12-04T14:00:07.9517446Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_fast_cuda PASSED [1.0267s] [  5%]
2025-12-04T14:00:07.9517843Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_slow_cuda PASSED [27.1667s] [  5%]
2025-12-04T14:00:07.9518212Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_fast_cuda PASSED [0.4711s] [  5%]
2025-12-04T14:00:07.9518611Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_slow_cuda PASSED [20.6427s] [  5%]
2025-12-04T14:00:07.9519031Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_fast_cuda PASSED [0.4668s] [  5%]
2025-12-04T14:00:07.9519423Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_slow_cuda PASSED [23.6739s] [  6%]
2025-12-04T14:00:07.9519796Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_fast_cuda PASSED [0.4209s] [  6%]
2025-12-04T14:00:07.9520174Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_slow_cuda PASSED [16.6649s] [  6%]
2025-12-04T14:00:07.9520556Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_fast_cuda PASSED [0.4370s] [  6%]
2025-12-04T14:00:07.9520952Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_slow_cuda PASSED [20.3160s] [  6%]
2025-12-04T14:00:07.9521360Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bfloat16 PASSED [0.0948s] [  6%]
2025-12-04T14:00:07.9521749Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bool PASSED [0.0154s] [  6%]
2025-12-04T14:00:07.9522133Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex128 PASSED [0.0180s] [  6%]
2025-12-04T14:00:07.9522503Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex32 PASSED [0.8402s] [  6%]
2025-12-04T14:00:07.9522875Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex64 PASSED [0.0165s] [  6%]
2025-12-04T14:00:07.9523240Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float16 PASSED [0.0155s] [  6%]
2025-12-04T14:00:07.9523601Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float32 PASSED [0.0149s] [  6%]
2025-12-04T14:00:07.9523965Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float64 PASSED [0.0160s] [  6%]
2025-12-04T14:00:07.9524315Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int16 PASSED [0.0151s] [  6%]
2025-12-04T14:00:07.9524670Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int32 PASSED [0.0147s] [  7%]
2025-12-04T14:00:07.9525015Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int64 PASSED [0.0152s] [  7%]
2025-12-04T14:00:07.9525432Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int8 PASSED [0.0155s] [  7%]
2025-12-04T14:00:07.9525825Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_uint8 PASSED [0.0150s] [  7%]
2025-12-04T14:00:07.9526186Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bfloat16 PASSED [0.0140s] [  7%]
2025-12-04T14:00:07.9526536Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bool PASSED [0.0144s] [  7%]
2025-12-04T14:00:07.9526913Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex128 PASSED [0.0165s] [  7%]
2025-12-04T14:00:07.9527283Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex32 PASSED [0.0142s] [  7%]
2025-12-04T14:00:07.9527656Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex64 PASSED [0.0149s] [  7%]
2025-12-04T14:00:07.9528019Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float16 PASSED [0.0145s] [  7%]
2025-12-04T14:00:07.9528383Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float32 PASSED [0.0139s] [  7%]
2025-12-04T14:00:07.9528785Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float64 PASSED [0.0150s] [  7%]
2025-12-04T14:00:07.9529143Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int16 PASSED [0.0139s] [  7%]
2025-12-04T14:00:07.9529497Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int32 PASSED [0.0136s] [  7%]
2025-12-04T14:00:07.9529846Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int64 PASSED [0.0137s] [  8%]
2025-12-04T14:00:07.9530195Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int8 PASSED [0.0141s] [  8%]
2025-12-04T14:00:07.9530543Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_uint8 PASSED [0.0138s] [  8%]
2025-12-04T14:00:07.9530910Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bfloat16 PASSED [0.0168s] [  8%]
2025-12-04T14:00:07.9531262Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bool PASSED [0.0147s] [  8%]
2025-12-04T14:00:07.9531633Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex128 PASSED [0.0218s] [  8%]
2025-12-04T14:00:07.9532003Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex32 PASSED [0.2239s] [  8%]
2025-12-04T14:00:07.9532417Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex64 PASSED [0.0190s] [  8%]
2025-12-04T14:00:07.9532815Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float16 PASSED [0.0164s] [  8%]
2025-12-04T14:00:07.9533180Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float32 PASSED [0.0164s] [  8%]
2025-12-04T14:00:07.9533538Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float64 PASSED [0.0179s] [  8%]
2025-12-04T14:00:07.9533887Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int16 PASSED [0.0146s] [  8%]
2025-12-04T14:00:07.9534242Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int32 PASSED [0.0148s] [  8%]
2025-12-04T14:00:07.9534775Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int64 PASSED [0.0153s] [  8%]
2025-12-04T14:00:07.9535225Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int8 PASSED [0.0145s] [  9%]
2025-12-04T14:00:07.9535574Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_uint8 PASSED [0.0146s] [  9%]
2025-12-04T14:00:07.9535938Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bfloat16 PASSED [0.0129s] [  9%]
2025-12-04T14:00:07.9536286Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bool PASSED [0.0126s] [  9%]
2025-12-04T14:00:07.9536714Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex128 PASSED [0.0143s] [  9%]
2025-12-04T14:00:07.9537127Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex32 PASSED [0.0137s] [  9%]
2025-12-04T14:00:07.9537490Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex64 PASSED [0.0133s] [  9%]
2025-12-04T14:00:07.9537849Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float16 PASSED [0.0127s] [  9%]
2025-12-04T14:00:07.9538214Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float32 PASSED [0.0128s] [  9%]
2025-12-04T14:00:07.9538572Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float64 PASSED [0.0136s] [  9%]
2025-12-04T14:00:07.9538925Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int16 PASSED [0.0124s] [  9%]
2025-12-04T14:00:07.9539353Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int32 PASSED [0.0124s] [  9%]
2025-12-04T14:00:07.9539702Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int64 PASSED [0.0126s] [  9%]
2025-12-04T14:00:07.9540055Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int8 PASSED [0.0129s] [  9%]
2025-12-04T14:00:07.9540404Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_uint8 PASSED [0.0124s] [ 10%]
2025-12-04T14:00:07.9540772Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bfloat16 PASSED [0.0222s] [ 10%]
2025-12-04T14:00:07.9541119Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bool PASSED [0.0216s] [ 10%]
2025-12-04T14:00:07.9541493Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex128 PASSED [0.0257s] [ 10%]
2025-12-04T14:00:07.9541864Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex32 PASSED [0.0241s] [ 10%]
2025-12-04T14:00:07.9542236Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex64 PASSED [0.0236s] [ 10%]
2025-12-04T14:00:07.9542601Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float16 PASSED [0.0222s] [ 10%]
2025-12-04T14:00:07.9542960Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float32 PASSED [0.0230s] [ 10%]
2025-12-04T14:00:07.9543316Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float64 PASSED [0.0235s] [ 10%]
2025-12-04T14:00:07.9543722Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int16 PASSED [0.0216s] [ 10%]
2025-12-04T14:00:07.9544109Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int32 PASSED [0.0218s] [ 10%]
2025-12-04T14:00:07.9544465Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int64 PASSED [0.0225s] [ 10%]
2025-12-04T14:00:07.9544808Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int8 PASSED [0.0217s] [ 10%]
2025-12-04T14:00:07.9545159Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_uint8 PASSED [0.0216s] [ 10%]
2025-12-04T14:00:07.9545998Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSC_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0013s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9546807Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSR_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0012s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9547623Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCOO_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0016s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9548490Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSC_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0012s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9549365Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSR_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0012s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9549946Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSC_cuda SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 11%]
2025-12-04T14:00:07.9550525Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSR_cuda SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 11%]
2025-12-04T14:00:07.9550872Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCOO_cuda PASSED [15.7366s] [ 11%]
2025-12-04T14:00:07.9551211Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSC_cuda PASSED [59.6674s] [ 11%]
2025-12-04T14:00:07.9551553Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSR_cuda PASSED [51.3308s] [ 11%]
2025-12-04T14:00:07.9552049Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSC_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9552539Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9553039Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9553532Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 11%]
2025-12-04T14:00:07.9554024Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9554450Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9554875Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSR_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9555300Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9555719Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9556189Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9556639Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_Strided_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9557078Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9557521Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSR_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9557963Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9558402Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9558889Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9559319Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_Strided_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%]
2025-12-04T14:00:07.9559727Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSC_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%]
2025-12-04T14:00:07.9560133Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSR_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%]
2025-12-04T14:00:07.9560584Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCOO_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%]
2025-12-04T14:00:07.9561024Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSC_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%]
2025-12-04T14:00:07.9561425Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSR_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%]
2025-12-04T14:00:07.9561727Z test_sparse.py::TestSparseAnyCUDA::test_generate_simple_inputs_cuda PASSED [0.1417s] [ 13%]
2025-12-04T14:00:07.9562167Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_complex128 SKIPPED [0.0025s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9562604Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_float64 SKIPPED [0.0018s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9563041Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_complex128 SKIPPED [0.0018s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9563464Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_float64 SKIPPED [0.0017s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9563906Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_complex128 SKIPPED [0.0018s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9564327Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_float64 SKIPPED [0.0017s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9564773Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_complex128 SKIPPED [0.0022s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9565199Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_float64 SKIPPED [0.0017s] (NOT IMPL) [ 13%]
2025-12-04T14:00:07.9565635Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_complex128 SKIPPED [0.0129s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9566061Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_float64 SKIPPED [0.0039s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9566498Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_complex128 SKIPPED [0.0234s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9566923Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_float64 SKIPPED [0.0126s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9567356Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_complex128 SKIPPED [0.0105s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9567827Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_float64 SKIPPED [0.0059s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9568334Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_complex128 SKIPPED [0.0277s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9568807Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_float64 SKIPPED [0.0153s] (NOT IMPL) [ 14%]
2025-12-04T14:00:07.9569203Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_complex128 PASSED [0.0708s] [ 14%]
2025-12-04T14:00:07.9569586Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_float64 PASSED [0.0327s] [ 14%]
2025-12-04T14:00:07.9569978Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_complex128 PASSED [0.1016s] [ 14%]
2025-12-04T14:00:07.9570363Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_float64 PASSED [0.0398s] [ 14%]
2025-12-04T14:00:07.9570751Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_complex128 PASSED [0.0211s] [ 14%]
2025-12-04T14:00:07.9571139Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_float64 PASSED [0.0084s] [ 14%]
2025-12-04T14:00:07.9571526Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_complex128 PASSED [0.1082s] [ 15%]
2025-12-04T14:00:07.9571944Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_float64 PASSED [0.0373s] [ 15%]
2025-12-04T14:00:07.9572430Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_complex128 SKIPPED [0.0410s] (NOT IMPL) [ 15%]
2025-12-04T14:00:07.9572852Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_float64 SKIPPED [0.0053s] (NOT IMPL) [ 15%]
2025-12-04T14:00:07.9573295Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_complex128 SKIPPED [0.0441s] (NOT IMPL) [ 15%]
2025-12-04T14:00:07.9573726Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_float64 SKIPPED [0.0235s] (NOT IMPL) [ 15%]
2025-12-04T14:00:07.9574114Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_complex128 PASSED [0.0260s] [ 15%]
2025-12-04T14:00:07.9574492Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_float64 PASSED [0.0095s] [ 15%]
2025-12-04T14:00:07.9574884Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_complex128 PASSED [0.1576s] [ 15%]
2025-12-04T14:00:07.9575266Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_float64 PASSED [0.0444s] [ 15%]
2025-12-04T14:00:07.9575656Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_complex128 PASSED [0.0368s] [ 15%]
2025-12-04T14:00:07.9576030Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_float64 PASSED [0.0078s] [ 15%]
2025-12-04T14:00:07.9576422Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_complex128 PASSED [0.1094s] [ 15%]
2025-12-04T14:00:07.9576805Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_float64 PASSED [0.0450s] [ 15%]
2025-12-04T14:00:07.9577196Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_complex128 PASSED [0.0171s] [ 16%]
2025-12-04T14:00:07.9577572Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_float64 PASSED [0.0074s] [ 16%]
2025-12-04T14:00:07.9577961Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_complex128 PASSED [0.0943s] [ 16%]
2025-12-04T14:00:07.9578342Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_float64 PASSED [0.0339s] [ 16%]
2025-12-04T14:00:07.9578802Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_complex128 PASSED [21.2771s] [ 16%]
2025-12-04T14:00:07.9579321Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_float64 PASSED [8.8414s] [ 16%]
2025-12-04T14:00:07.9579794Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_complex128 PASSED [15.0273s] [ 16%]
2025-12-04T14:00:07.9580199Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_float64 PASSED [5.5979s] [ 16%]
2025-12-04T14:00:07.9580625Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_complex128 PASSED [20.1339s] [ 16%]
2025-12-04T14:00:07.9581033Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_float64 PASSED [8.0320s] [ 16%]
2025-12-04T14:00:07.9581456Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_complex128 PASSED [14.6100s] [ 16%]
2025-12-04T14:00:07.9581858Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_float64 PASSED [5.4254s] [ 16%]
2025-12-04T14:00:07.9582279Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_complex128 PASSED [13.3815s] [ 16%]
2025-12-04T14:00:07.9582690Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_float64 PASSED [5.6125s] [ 16%]
2025-12-04T14:00:07.9583110Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_complex128 PASSED [12.7898s] [ 17%]
2025-12-04T14:00:07.9583558Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_float64 PASSED [4.8634s] [ 17%]
2025-12-04T14:00:07.9584018Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_complex128 PASSED [15.1337s] [ 17%]
2025-12-04T14:00:07.9584420Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_float64 PASSED [6.5944s] [ 17%]
2025-12-04T14:00:07.9584849Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_complex128 PASSED [14.1451s] [ 17%]
2025-12-04T14:00:07.9585257Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_float64 PASSED [5.2235s] [ 17%]
2025-12-04T14:00:07.9585681Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_complex128 PASSED [12.0550s] [ 17%]
2025-12-04T14:00:07.9586082Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_float64 PASSED [4.9622s] [ 17%]
2025-12-04T14:00:07.9586505Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_complex128 PASSED [13.7667s] [ 17%]
2025-12-04T14:00:07.9586913Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_float64 PASSED [4.9865s] [ 17%]
2025-12-04T14:00:07.9587268Z test_sparse.py::TestSparseAnyCUDA::test_invalid_blocksize_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 17%]
2025-12-04T14:00:07.9587627Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_bfloat16 PASSED [0.0911s] [ 17%]
2025-12-04T14:00:07.9587994Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex128 PASSED [0.0867s] [ 17%]
2025-12-04T14:00:07.9588348Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex32 PASSED [0.0864s] [ 17%]
2025-12-04T14:00:07.9588714Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex64 PASSED [0.0873s] [ 18%]
2025-12-04T14:00:07.9589101Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float16 PASSED [0.0863s] [ 18%]
2025-12-04T14:00:07.9589449Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float32 PASSED [0.0860s] [ 18%]
2025-12-04T14:00:07.9589790Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float64 PASSED [0.0702s] [ 18%]
2025-12-04T14:00:07.9590137Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_bfloat16 PASSED [0.0859s] [ 18%]
2025-12-04T14:00:07.9590553Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex128 PASSED [0.0860s] [ 18%]
2025-12-04T14:00:07.9590948Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex32 PASSED [0.0873s] [ 18%]
2025-12-04T14:00:07.9591307Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex64 PASSED [0.0860s] [ 18%]
2025-12-04T14:00:07.9591650Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float16 PASSED [0.0850s] [ 18%]
2025-12-04T14:00:07.9591991Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float32 PASSED [0.0864s] [ 18%]
2025-12-04T14:00:07.9592343Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float64 PASSED [0.0689s] [ 18%]
2025-12-04T14:00:07.9592758Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_bfloat16 SKIPPED [0.0159s] (NO SAMPLES!) [ 18%]
2025-12-04T14:00:07.9593195Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex128 SKIPPED [0.0158s] (NO SAMPLES!) [ 18%]
2025-12-04T14:00:07.9593619Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex32 SKIPPED [0.0154s] (NO SAMPLES!) [ 18%]
2025-12-04T14:00:07.9594041Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex64 SKIPPED [0.0154s] (NO SAMPLES!) [ 19%]
2025-12-04T14:00:07.9594497Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float16 SKIPPED [0.0157s] (NO SAMPLES!) [ 19%]
2025-12-04T14:00:07.9594908Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float32 SKIPPED [0.0153s] (NO SAMPLES!) [ 19%]
2025-12-04T14:00:07.9595449Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float64 SKIPPED [0.0153s] (NO SAMPLES!) [ 19%]
2025-12-04T14:00:07.9595804Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_bfloat16 PASSED [0.0802s] [ 19%]
2025-12-04T14:00:07.9596165Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex128 PASSED [0.0792s] [ 19%]
2025-12-04T14:00:07.9596557Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex32 PASSED [0.0791s] [ 19%]
2025-12-04T14:00:07.9596915Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex64 PASSED [0.0799s] [ 19%]
2025-12-04T14:00:07.9597268Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float16 PASSED [0.0787s] [ 19%]
2025-12-04T14:00:07.9597616Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float32 PASSED [0.0791s] [ 19%]
2025-12-04T14:00:07.9597963Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float64 PASSED [0.0645s] [ 19%]
2025-12-04T14:00:07.9598322Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_bfloat16 PASSED [0.0795s] [ 19%]
2025-12-04T14:00:07.9598684Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex128 PASSED [0.0794s] [ 19%]
2025-12-04T14:00:07.9599047Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex32 PASSED [0.0801s] [ 19%]
2025-12-04T14:00:07.9599409Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex64 PASSED [0.0790s] [ 20%]
2025-12-04T14:00:07.9599755Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float16 PASSED [0.0793s] [ 20%]
2025-12-04T14:00:07.9600113Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float32 PASSED [0.0804s] [ 20%]
2025-12-04T14:00:07.9600463Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float64 PASSED [0.0636s] [ 20%]
2025-12-04T14:00:07.9600812Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bfloat16 PASSED [0.0630s] [ 20%]
2025-12-04T14:00:07.9601153Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bool PASSED [0.0642s] [ 20%]
2025-12-04T14:00:07.9601560Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex128 PASSED [0.0636s] [ 20%]
2025-12-04T14:00:07.9601965Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex32 PASSED [0.0633s] [ 20%]
2025-12-04T14:00:07.9602323Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex64 PASSED [0.0640s] [ 20%]
2025-12-04T14:00:07.9602665Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float16 PASSED [0.0631s] [ 20%]
2025-12-04T14:00:07.9603019Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float32 PASSED [0.0629s] [ 20%]
2025-12-04T14:00:07.9603364Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float64 PASSED [0.0568s] [ 20%]
2025-12-04T14:00:07.9603709Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int16 PASSED [0.0629s] [ 20%]
2025-12-04T14:00:07.9604051Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int32 PASSED [0.0618s] [ 20%]
2025-12-04T14:00:07.9604386Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int64 PASSED [0.0642s] [ 21%]
2025-12-04T14:00:07.9604737Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int8 PASSED [0.0632s] [ 21%]
2025-12-04T14:00:07.9605076Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_uint8 PASSED [0.0632s] [ 21%]
2025-12-04T14:00:07.9605478Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bfloat16 PASSED [0.0640s] [ 21%]
2025-12-04T14:00:07.9605850Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bool PASSED [0.0628s] [ 21%]
2025-12-04T14:00:07.9606211Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex128 PASSED [0.0628s] [ 21%]
2025-12-04T14:00:07.9606575Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex32 PASSED [0.0639s] [ 21%]
2025-12-04T14:00:07.9606932Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex64 PASSED [0.0627s] [ 21%]
2025-12-04T14:00:07.9607290Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float16 PASSED [0.0630s] [ 21%]
2025-12-04T14:00:07.9607637Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float32 PASSED [0.0641s] [ 21%]
2025-12-04T14:00:07.9608156Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float64 PASSED [0.0551s] [ 21%]
2025-12-04T14:00:07.9608516Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int16 PASSED [0.0628s] [ 21%]
2025-12-04T14:00:07.9608891Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int32 PASSED [0.0641s] [ 21%]
2025-12-04T14:00:07.9609239Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int64 PASSED [0.0624s] [ 21%]
2025-12-04T14:00:07.9609573Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int8 PASSED [0.0629s] [ 22%]
2025-12-04T14:00:07.9609907Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_uint8 PASSED [0.0640s] [ 22%]
2025-12-04T14:00:07.9610332Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bfloat16 SKIPPED [0.0156s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9610737Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bool SKIPPED [0.0154s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9611173Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex128 SKIPPED [0.0157s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9611594Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex32 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9612017Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex64 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9612541Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float16 SKIPPED [0.0157s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9613007Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float32 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9613424Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float64 SKIPPED [0.0153s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9613824Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int16 SKIPPED [0.0158s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9614227Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int32 SKIPPED [0.0153s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9614637Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int64 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9615034Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int8 SKIPPED [0.0157s] (NO SAMPLES!) [ 22%]
2025-12-04T14:00:07.9615441Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_uint8 SKIPPED [0.0153s] (NO SAMPLES!) [ 23%]
2025-12-04T14:00:07.9615794Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bfloat16 PASSED [0.0554s] [ 23%]
2025-12-04T14:00:07.9616125Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bool PASSED [0.0563s] [ 23%]
2025-12-04T14:00:07.9616488Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex128 PASSED [0.0553s] [ 23%]
2025-12-04T14:00:07.9616901Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex32 PASSED [0.0553s] [ 23%]
2025-12-04T14:00:07.9617325Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex64 PASSED [0.0558s] [ 23%]
2025-12-04T14:00:07.9617675Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float16 PASSED [0.0559s] [ 23%]
2025-12-04T14:00:07.9618019Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float32 PASSED [0.0559s] [ 23%]
2025-12-04T14:00:07.9618366Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float64 PASSED [0.0494s] [ 23%]
2025-12-04T14:00:07.9618702Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int16 PASSED [0.0554s] [ 23%]
2025-12-04T14:00:07.9619077Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int32 PASSED [0.0554s] [ 23%]
2025-12-04T14:00:07.9619420Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int64 PASSED [0.0564s] [ 23%]
2025-12-04T14:00:07.9619751Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int8 PASSED [0.0551s] [ 23%]
2025-12-04T14:00:07.9620096Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_uint8 PASSED [0.0552s] [ 23%]
2025-12-04T14:00:07.9620443Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bfloat16 PASSED [0.0564s] [ 24%]
2025-12-04T14:00:07.9620774Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bool PASSED [0.0553s] [ 24%]
2025-12-04T14:00:07.9621141Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex128 PASSED [0.0546s] [ 24%]
2025-12-04T14:00:07.9621496Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex32 PASSED [0.0564s] [ 24%]
2025-12-04T14:00:07.9621859Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex64 PASSED [0.0556s] [ 24%]
2025-12-04T14:00:07.9622201Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float16 PASSED [0.0553s] [ 24%]
2025-12-04T14:00:07.9622542Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float32 PASSED [0.0564s] [ 24%]
2025-12-04T14:00:07.9622890Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float64 PASSED [0.0489s] [ 24%]
2025-12-04T14:00:07.9623222Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int16 PASSED [0.0551s] [ 24%]
2025-12-04T14:00:07.9623636Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int32 PASSED [0.0565s] [ 24%]
2025-12-04T14:00:07.9624034Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int64 PASSED [0.0554s] [ 24%]
2025-12-04T14:00:07.9624391Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int8 PASSED [0.0552s] [ 24%]
2025-12-04T14:00:07.9624762Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_uint8 PASSED [0.0562s] [ 24%]
2025-12-04T14:00:07.9625185Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSC_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 25%]
2025-12-04T14:00:07.9625618Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 25%]
2025-12-04T14:00:07.9626039Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 25%]
2025-12-04T14:00:07.9626468Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSC_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 25%]
2025-12-04T14:00:07.9626894Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 25%]
2025-12-04T14:00:07.9627304Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_Strided_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 25%]
2025-12-04T14:00:07.9627730Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex128 PASSED [0.0115s] [ 25%]
2025-12-04T14:00:07.9628191Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex64 PASSED [0.0112s] [ 25%]
2025-12-04T14:00:07.9628606Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float32 PASSED [0.0109s] [ 25%]
2025-12-04T14:00:07.9628988Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float64 PASSED [0.0116s] [ 25%]
2025-12-04T14:00:07.9629380Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex128 PASSED [0.0106s] [ 25%]
2025-12-04T14:00:07.9629772Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex64 PASSED [0.0106s] [ 25%]
2025-12-04T14:00:07.9630144Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float32 PASSED [0.0103s] [ 25%]
2025-12-04T14:00:07.9630523Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float64 PASSED [0.0107s] [ 25%]
2025-12-04T14:00:07.9630921Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex128 PASSED [0.0440s] [ 26%]
2025-12-04T14:00:07.9631316Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex64 PASSED [0.0441s] [ 26%]
2025-12-04T14:00:07.9631702Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float32 PASSED [0.0417s] [ 26%]
2025-12-04T14:00:07.9632076Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float64 PASSED [0.0419s] [ 26%]
2025-12-04T14:00:07.9632473Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex128 PASSED [0.0106s] [ 26%]
2025-12-04T14:00:07.9632870Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex64 PASSED [0.0105s] [ 26%]
2025-12-04T14:00:07.9633245Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float32 PASSED [0.0103s] [ 26%]
2025-12-04T14:00:07.9633629Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float64 PASSED [0.0108s] [ 26%]
2025-12-04T14:00:07.9634017Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex128 PASSED [0.0257s] [ 26%]
2025-12-04T14:00:07.9634402Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex64 PASSED [0.0127s] [ 26%]
2025-12-04T14:00:07.9634788Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float32 PASSED [0.0122s] [ 26%]
2025-12-04T14:00:07.9635206Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float64 PASSED [0.0126s] [ 26%]
2025-12-04T14:00:07.9635592Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bfloat16 PASSED [0.0108s] [ 26%]
2025-12-04T14:00:07.9635934Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bool PASSED [0.0055s] [ 26%]
2025-12-04T14:00:07.9636281Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex128 PASSED [0.0108s] [ 27%]
2025-12-04T14:00:07.9636634Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex32 PASSED [0.7139s] [ 27%]
2025-12-04T14:00:07.9636980Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex64 PASSED [0.0110s] [ 27%]
2025-12-04T14:00:07.9637319Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float16 PASSED [0.0108s] [ 27%]
2025-12-04T14:00:07.9637663Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float32 PASSED [0.0106s] [ 27%]
2025-12-04T14:00:07.9637996Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float64 PASSED [0.0111s] [ 27%]
2025-12-04T14:00:07.9638340Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int16 PASSED [0.0104s] [ 27%]
2025-12-04T14:00:07.9638670Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int32 PASSED [0.0104s] [ 27%]
2025-12-04T14:00:07.9639045Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int64 PASSED [0.0102s] [ 27%]
2025-12-04T14:00:07.9639379Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int8 PASSED [0.0107s] [ 27%]
2025-12-04T14:00:07.9639749Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_uint8 PASSED [0.0055s] [ 27%]
2025-12-04T14:00:07.9640092Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bfloat16 PASSED [0.0103s] [ 27%]
2025-12-04T14:00:07.9640422Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bool PASSED [0.0053s] [ 27%]
2025-12-04T14:00:07.9640769Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex128 PASSED [0.0107s] [ 27%]
2025-12-04T14:00:07.9641119Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex32 PASSED [0.0103s] [ 28%]
2025-12-04T14:00:07.9641463Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex64 PASSED [0.0103s] [ 28%]
2025-12-04T14:00:07.9641803Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float16 PASSED [0.0102s] [ 28%]
2025-12-04T14:00:07.9642134Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float32 PASSED [0.0113s] [ 28%]
2025-12-04T14:00:07.9642470Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float64 PASSED [0.0106s] [ 28%]
2025-12-04T14:00:07.9642808Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int16 PASSED [0.0098s] [ 28%]
2025-12-04T14:00:07.9643136Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int32 PASSED [0.0098s] [ 28%]
2025-12-04T14:00:07.9643462Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int64 PASSED [0.0105s] [ 28%]
2025-12-04T14:00:07.9643797Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int8 PASSED [0.0098s] [ 28%]
2025-12-04T14:00:07.9644125Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_uint8 PASSED [0.0053s] [ 28%]
2025-12-04T14:00:07.9644470Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bfloat16 PASSED [0.0336s] [ 28%]
2025-12-04T14:00:07.9644793Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bool PASSED [0.0177s] [ 28%]
2025-12-04T14:00:07.9645147Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex128 PASSED [0.0348s] [ 28%]
2025-12-04T14:00:07.9645496Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex32 PASSED [1.1282s] [ 28%]
2025-12-04T14:00:07.9645881Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex64 PASSED [0.0343s] [ 29%]
2025-12-04T14:00:07.9646261Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float16 PASSED [0.0338s] [ 29%]
2025-12-04T14:00:07.9646595Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float32 PASSED [0.0329s] [ 29%]
2025-12-04T14:00:07.9646927Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float64 PASSED [0.0329s] [ 29%]
2025-12-04T14:00:07.9647262Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int16 PASSED [0.0168s] [ 29%]
2025-12-04T14:00:07.9647589Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int32 PASSED [0.0172s] [ 29%]
2025-12-04T14:00:07.9647918Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int64 PASSED [0.0159s] [ 29%]
2025-12-04T14:00:07.9648243Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int8 PASSED [0.0167s] [ 29%]
2025-12-04T14:00:07.9648572Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_uint8 PASSED [0.0167s] [ 29%]
2025-12-04T14:00:07.9648920Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bfloat16 PASSED [0.0106s] [ 29%]
2025-12-04T14:00:07.9649246Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bool PASSED [0.0052s] [ 29%]
2025-12-04T14:00:07.9649598Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex128 PASSED [0.0101s] [ 29%]
2025-12-04T14:00:07.9649982Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex32 PASSED [0.0102s] [ 29%]
2025-12-04T14:00:07.9650391Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex64 PASSED [0.0106s] [ 29%]
2025-12-04T14:00:07.9650726Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float16 PASSED [0.0100s] [ 30%]
2025-12-04T14:00:07.9651058Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float32 PASSED [0.0100s] [ 30%]
2025-12-04T14:00:07.9651391Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float64 PASSED [0.0100s] [ 30%]
2025-12-04T14:00:07.9651727Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int16 PASSED [0.0101s] [ 30%]
2025-12-04T14:00:07.9652052Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int32 PASSED [0.0097s] [ 30%]
2025-12-04T14:00:07.9652377Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int64 PASSED [0.0095s] [ 30%]
2025-12-04T14:00:07.9652702Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int8 PASSED [0.0097s] [ 30%]
2025-12-04T14:00:07.9653030Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_uint8 PASSED [0.0056s] [ 30%]
2025-12-04T14:00:07.9653373Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bfloat16 PASSED [0.0121s] [ 30%]
2025-12-04T14:00:07.9653697Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bool PASSED [0.0050s] [ 30%]
2025-12-04T14:00:07.9654053Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex128 PASSED [0.0120s] [ 30%]
2025-12-04T14:00:07.9654397Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex32 PASSED [0.0101s] [ 30%]
2025-12-04T14:00:07.9654739Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex64 PASSED [0.0121s] [ 30%]
2025-12-04T14:00:07.9655078Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float16 PASSED [0.0121s] [ 30%]
2025-12-04T14:00:07.9655410Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float32 PASSED [0.0118s] [ 31%]
2025-12-04T14:00:07.9655755Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float64 PASSED [0.0124s] [ 31%]
2025-12-04T14:00:07.9656079Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int16 PASSED [0.0113s] [ 31%]
2025-12-04T14:00:07.9656452Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int32 PASSED [0.0112s] [ 31%]
2025-12-04T14:00:07.9656788Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int64 PASSED [0.0110s] [ 31%]
2025-12-04T14:00:07.9657149Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int8 PASSED [0.0117s] [ 31%]
2025-12-04T14:00:07.9657482Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_uint8 PASSED [0.0071s] [ 31%]
2025-12-04T14:00:07.9657807Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bfloat16 PASSED [0.0224s] [ 31%]
2025-12-04T14:00:07.9658117Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bool PASSED [0.0208s] [ 31%]
2025-12-04T14:00:07.9658458Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex128 PASSED [0.0224s] [ 31%]
2025-12-04T14:00:07.9658785Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex64 PASSED [0.0230s] [ 31%]
2025-12-04T14:00:07.9659149Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float16 PASSED [0.0221s] [ 31%]
2025-12-04T14:00:07.9659473Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float32 PASSED [0.0222s] [ 31%]
2025-12-04T14:00:07.9659796Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float64 PASSED [0.0221s] [ 31%]
2025-12-04T14:00:07.9660111Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int16 PASSED [0.0209s] [ 32%]
2025-12-04T14:00:07.9660478Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int32 PASSED [0.0209s] [ 32%]
2025-12-04T14:00:07.9660788Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int64 PASSED [0.0213s] [ 32%]
2025-12-04T14:00:07.9661142Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int8 PASSED [0.0208s] [ 32%]
2025-12-04T14:00:07.9661451Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_uint8 PASSED [0.0209s] [ 32%]
2025-12-04T14:00:07.9661782Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bfloat16 PASSED [0.0218s] [ 32%]
2025-12-04T14:00:07.9662085Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bool PASSED [0.0206s] [ 32%]
2025-12-04T14:00:07.9662419Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex128 PASSED [0.0222s] [ 32%]
2025-12-04T14:00:07.9662759Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex64 PASSED [0.0226s] [ 32%]
2025-12-04T14:00:07.9663078Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float16 PASSED [0.0218s] [ 32%]
2025-12-04T14:00:07.9663399Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float32 PASSED [0.0218s] [ 32%]
2025-12-04T14:00:07.9663719Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float64 PASSED [0.0218s] [ 32%]
2025-12-04T14:00:07.9664032Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int16 PASSED [0.0207s] [ 32%]
2025-12-04T14:00:07.9664346Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int32 PASSED [0.0207s] [ 32%]
2025-12-04T14:00:07.9664656Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int64 PASSED [0.0212s] [ 33%]
2025-12-04T14:00:07.9664962Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int8 PASSED [0.0206s] [ 33%]
2025-12-04T14:00:07.9665278Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_uint8 PASSED [0.0206s] [ 33%]
2025-12-04T14:00:07.9665601Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bfloat16 PASSED [0.0141s] [ 33%]
2025-12-04T14:00:07.9665911Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bool PASSED [0.0125s] [ 33%]
2025-12-04T14:00:07.9666248Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex128 PASSED [0.0142s] [ 33%]
2025-12-04T14:00:07.9666579Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex64 PASSED [0.0147s] [ 33%]
2025-12-04T14:00:07.9666960Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float16 PASSED [0.0139s] [ 33%]
2025-12-04T14:00:07.9667281Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float32 PASSED [0.0138s] [ 33%]
2025-12-04T14:00:07.9667648Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float64 PASSED [0.0138s] [ 33%]
2025-12-04T14:00:07.9667957Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int16 PASSED [0.0126s] [ 33%]
2025-12-04T14:00:07.9668266Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int32 PASSED [0.0128s] [ 33%]
2025-12-04T14:00:07.9668602Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int64 PASSED [0.0131s] [ 33%]
2025-12-04T14:00:07.9668944Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int8 PASSED [0.0126s] [ 33%]
2025-12-04T14:00:07.9669264Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_uint8 PASSED [0.0126s] [ 34%]
2025-12-04T14:00:07.9669585Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bfloat16 PASSED [0.0215s] [ 34%]
2025-12-04T14:00:07.9669896Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bool PASSED [0.0202s] [ 34%]
2025-12-04T14:00:07.9670239Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex128 PASSED [0.0217s] [ 34%]
2025-12-04T14:00:07.9670566Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex64 PASSED [0.0222s] [ 34%]
2025-12-04T14:00:07.9670884Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float16 PASSED [0.0214s] [ 34%]
2025-12-04T14:00:07.9671255Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float32 PASSED [0.0214s] [ 34%]
2025-12-04T14:00:07.9671610Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float64 PASSED [0.0214s] [ 34%]
2025-12-04T14:00:07.9671923Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int16 PASSED [0.0202s] [ 34%]
2025-12-04T14:00:07.9672231Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int32 PASSED [0.0201s] [ 34%]
2025-12-04T14:00:07.9672541Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int64 PASSED [0.0207s] [ 34%]
2025-12-04T14:00:07.9672857Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int8 PASSED [0.0202s] [ 34%]
2025-12-04T14:00:07.9673165Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_uint8 PASSED [0.0202s] [ 34%]
2025-12-04T14:00:07.9673490Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bfloat16 PASSED [0.0210s] [ 34%]
2025-12-04T14:00:07.9673798Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bool PASSED [0.0198s] [ 35%]
2025-12-04T14:00:07.9674134Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex128 PASSED [0.0213s] [ 35%]
2025-12-04T14:00:07.9674466Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex64 PASSED [0.0218s] [ 35%]
2025-12-04T14:00:07.9674786Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float16 PASSED [0.0210s] [ 35%]
2025-12-04T14:00:07.9675112Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float32 PASSED [0.0209s] [ 35%]
2025-12-04T14:00:07.9675431Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float64 PASSED [0.0209s] [ 35%]
2025-12-04T14:00:07.9675742Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int16 PASSED [0.0197s] [ 35%]
2025-12-04T14:00:07.9676056Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int32 PASSED [0.0197s] [ 35%]
2025-12-04T14:00:07.9676363Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int64 PASSED [0.0202s] [ 35%]
2025-12-04T14:00:07.9676678Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int8 PASSED [0.0198s] [ 35%]
2025-12-04T14:00:07.9676985Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_uint8 PASSED [0.0197s] [ 35%]
2025-12-04T14:00:07.9677314Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bfloat16 PASSED [0.0562s] [ 35%]
2025-12-04T14:00:07.9677681Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bool PASSED [0.0468s] [ 35%]
2025-12-04T14:00:07.9678063Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex128 PASSED [0.0577s] [ 35%]
2025-12-04T14:00:07.9678402Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex64 PASSED [0.0581s] [ 36%]
2025-12-04T14:00:07.9678733Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float16 PASSED [0.0557s] [ 36%]
2025-12-04T14:00:07.9679058Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float32 PASSED [0.0558s] [ 36%]
2025-12-04T14:00:07.9679390Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float64 PASSED [0.0556s] [ 36%]
2025-12-04T14:00:07.9679709Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int16 PASSED [0.0469s] [ 36%]
2025-12-04T14:00:07.9680031Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int32 PASSED [0.0468s] [ 36%]
2025-12-04T14:00:07.9680360Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int64 PASSED [0.0473s] [ 36%]
2025-12-04T14:00:07.9680677Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int8 PASSED [0.0470s] [ 36%]
2025-12-04T14:00:07.9680999Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_uint8 PASSED [0.0468s] [ 36%]
2025-12-04T14:00:07.9681373Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bfloat16 PASSED [0.0556s] [ 36%]
2025-12-04T14:00:07.9681691Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bool PASSED [0.0468s] [ 36%]
2025-12-04T14:00:07.9682081Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex128 PASSED [0.0576s] [ 36%]
2025-12-04T14:00:07.9682417Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex64 PASSED [0.0578s] [ 36%]
2025-12-04T14:00:07.9682749Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float16 PASSED [0.0558s] [ 36%]
2025-12-04T14:00:07.9683071Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float32 PASSED [0.0556s] [ 37%]
2025-12-04T14:00:07.9683399Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float64 PASSED [0.0557s] [ 37%]
2025-12-04T14:00:07.9683721Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int16 PASSED [0.0468s] [ 37%]
2025-12-04T14:00:07.9684040Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int32 PASSED [0.0468s] [ 37%]
2025-12-04T14:00:07.9684365Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int64 PASSED [0.0470s] [ 37%]
2025-12-04T14:00:07.9684684Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int8 PASSED [0.0467s] [ 37%]
2025-12-04T14:00:07.9685001Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_uint8 PASSED [0.0468s] [ 37%]
2025-12-04T14:00:07.9685341Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bfloat16 PASSED [0.0558s] [ 37%]
2025-12-04T14:00:07.9685657Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bool PASSED [0.0468s] [ 37%]
2025-12-04T14:00:07.9685999Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex128 PASSED [0.0575s] [ 37%]
2025-12-04T14:00:07.9686337Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex64 PASSED [0.0580s] [ 37%]
2025-12-04T14:00:07.9686666Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float16 PASSED [0.0556s] [ 37%]
2025-12-04T14:00:07.9687000Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float32 PASSED [0.0556s] [ 37%]
2025-12-04T14:00:07.9687325Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float64 PASSED [0.0555s] [ 38%]
2025-12-04T14:00:07.9687642Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int16 PASSED [0.0468s] [ 38%]
2025-12-04T14:00:07.9688018Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int32 PASSED [0.0468s] [ 38%]
2025-12-04T14:00:07.9688375Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int64 PASSED [0.0475s] [ 38%]
2025-12-04T14:00:07.9688726Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int8 PASSED [0.0470s] [ 38%]
2025-12-04T14:00:07.9689069Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_uint8 PASSED [0.0470s] [ 38%]
2025-12-04T14:00:07.9689403Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bfloat16 PASSED [0.0554s] [ 38%]
2025-12-04T14:00:07.9689728Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bool PASSED [0.0467s] [ 38%]
2025-12-04T14:00:07.9690071Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex128 PASSED [0.0576s] [ 38%]
2025-12-04T14:00:07.9690414Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex64 PASSED [0.0579s] [ 38%]
2025-12-04T14:00:07.9690744Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float16 PASSED [0.0554s] [ 38%]
2025-12-04T14:00:07.9691073Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float32 PASSED [0.0555s] [ 38%]
2025-12-04T14:00:07.9691400Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float64 PASSED [0.0555s] [ 38%]
2025-12-04T14:00:07.9691719Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int16 PASSED [0.0466s] [ 38%]
2025-12-04T14:00:07.9692115Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int32 PASSED [0.0467s] [ 39%]
2025-12-04T14:00:07.9692474Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int64 PASSED [0.0469s] [ 39%]
2025-12-04T14:00:07.9692793Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int8 PASSED [0.0466s] [ 39%]
2025-12-04T14:00:07.9693116Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_uint8 PASSED [0.0467s] [ 39%]
2025-12-04T14:00:07.9693446Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bfloat16 PASSED [0.0410s] [ 39%]
2025-12-04T14:00:07.9693767Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bool PASSED [0.0321s] [ 39%]
2025-12-04T14:00:07.9694118Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex128 PASSED [0.0430s] [ 39%]
2025-12-04T14:00:07.9694450Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex64 PASSED [0.0435s] [ 39%]
2025-12-04T14:00:07.9694781Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float16 PASSED [0.0410s] [ 39%]
2025-12-04T14:00:07.9695107Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float32 PASSED [0.0409s] [ 39%]
2025-12-04T14:00:07.9695432Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float64 PASSED [0.0409s] [ 39%]
2025-12-04T14:00:07.9695757Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int16 PASSED [0.0322s] [ 39%]
2025-12-04T14:00:07.9696080Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int32 PASSED [0.0321s] [ 39%]
2025-12-04T14:00:07.9696411Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int64 PASSED [0.0325s] [ 39%]
2025-12-04T14:00:07.9696727Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int8 PASSED [0.0321s] [ 40%]
2025-12-04T14:00:07.9697045Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_uint8 PASSED [0.0321s] [ 40%]
2025-12-04T14:00:07.9697382Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bfloat16 PASSED [0.0404s] [ 40%]
2025-12-04T14:00:07.9697705Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bool PASSED [0.0316s] [ 40%]
2025-12-04T14:00:07.9698051Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex128 PASSED [0.0424s] [ 40%]
2025-12-04T14:00:07.9698385Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex64 PASSED [0.0430s] [ 40%]
2025-12-04T14:00:07.9698863Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float16 PASSED [0.0405s] [ 40%]
2025-12-04T14:00:07.9699302Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float32 PASSED [0.0403s] [ 40%]
2025-12-04T14:00:07.9704294Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float64 PASSED [0.0403s] [ 40%]
2025-12-04T14:00:07.9704631Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int16 PASSED [0.0316s] [ 40%]
2025-12-04T14:00:07.9704955Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int32 PASSED [0.0316s] [ 40%]
2025-12-04T14:00:07.9705272Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int64 PASSED [0.0319s] [ 40%]
2025-12-04T14:00:07.9705588Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int8 PASSED [0.0315s] [ 40%]
2025-12-04T14:00:07.9705910Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_uint8 PASSED [0.0316s] [ 40%]
2025-12-04T14:00:07.9706237Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bfloat16 PASSED [0.0532s] [ 41%]
2025-12-04T14:00:07.9706560Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bool PASSED [0.0449s] [ 41%]
2025-12-04T14:00:07.9706904Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex128 PASSED [0.0551s] [ 41%]
2025-12-04T14:00:07.9707318Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex64 PASSED [0.0555s] [ 41%]
2025-12-04T14:00:07.9707648Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float16 PASSED [0.0532s] [ 41%]
2025-12-04T14:00:07.9708323Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float32 PASSED [0.0532s] [ 41%]
2025-12-04T14:00:07.9708663Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float64 PASSED [0.0532s] [ 41%]
2025-12-04T14:00:07.9708983Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int16 PASSED [0.0446s] [ 41%]
2025-12-04T14:00:07.9709297Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int32 PASSED [0.0444s] [ 41%]
2025-12-04T14:00:07.9709617Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int64 PASSED [0.0447s] [ 41%]
2025-12-04T14:00:07.9709929Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int8 PASSED [0.0446s] [ 41%]
2025-12-04T14:00:07.9710244Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_uint8 PASSED [0.0446s] [ 41%]
2025-12-04T14:00:07.9710577Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bfloat16 PASSED [0.0531s] [ 41%]
2025-12-04T14:00:07.9710892Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bool PASSED [0.0446s] [ 41%]
2025-12-04T14:00:07.9711235Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex128 PASSED [0.0552s] [ 42%]
2025-12-04T14:00:07.9711570Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex64 PASSED [0.0557s] [ 42%]
2025-12-04T14:00:07.9711893Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float16 PASSED [0.0533s] [ 42%]
2025-12-04T14:00:07.9712220Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float32 PASSED [0.0531s] [ 42%]
2025-12-04T14:00:07.9712543Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float64 PASSED [0.0530s] [ 42%]
2025-12-04T14:00:07.9712862Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int16 PASSED [0.0445s] [ 42%]
2025-12-04T14:00:07.9713177Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int32 PASSED [0.0446s] [ 42%]
2025-12-04T14:00:07.9713492Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int64 PASSED [0.0449s] [ 42%]
2025-12-04T14:00:07.9713811Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int8 PASSED [0.0445s] [ 42%]
2025-12-04T14:00:07.9714222Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_uint8 PASSED [0.0445s] [ 42%]
2025-12-04T14:00:07.9714612Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bfloat16 PASSED [0.0532s] [ 42%]
2025-12-04T14:00:07.9714927Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bool PASSED [0.0445s] [ 42%]
2025-12-04T14:00:07.9715265Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex128 PASSED [0.0552s] [ 42%]
2025-12-04T14:00:07.9715606Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex64 PASSED [0.0557s] [ 42%]
2025-12-04T14:00:07.9715929Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float16 PASSED [0.0532s] [ 43%]
2025-12-04T14:00:07.9716254Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float32 PASSED [0.0528s] [ 43%]
2025-12-04T14:00:07.9716574Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float64 PASSED [0.0532s] [ 43%]
2025-12-04T14:00:07.9716892Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int16 PASSED [0.0446s] [ 43%]
2025-12-04T14:00:07.9717212Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int32 PASSED [0.0445s] [ 43%]
2025-12-04T14:00:07.9717526Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int64 PASSED [0.0452s] [ 43%]
2025-12-04T14:00:07.9717839Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int8 PASSED [0.0443s] [ 43%]
2025-12-04T14:00:07.9718220Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_uint8 PASSED [0.0445s] [ 43%]
2025-12-04T14:00:07.9718602Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bfloat16 PASSED [0.0532s] [ 43%]
2025-12-04T14:00:07.9718918Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bool PASSED [0.0445s] [ 43%]
2025-12-04T14:00:07.9719257Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex128 PASSED [0.0551s] [ 43%]
2025-12-04T14:00:07.9719594Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex64 PASSED [0.0556s] [ 43%]
2025-12-04T14:00:07.9719920Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float16 PASSED [0.0532s] [ 43%]
2025-12-04T14:00:07.9720243Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float32 PASSED [0.0531s] [ 43%]
2025-12-04T14:00:07.9720567Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float64 PASSED [0.0531s] [ 44%]
2025-12-04T14:00:07.9720884Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int16 PASSED [0.0446s] [ 44%]
2025-12-04T14:00:07.9721201Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int32 PASSED [0.0445s] [ 44%]
2025-12-04T14:00:07.9721517Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int64 PASSED [0.0448s] [ 44%]
2025-12-04T14:00:07.9721833Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int8 PASSED [0.0446s] [ 44%]
2025-12-04T14:00:07.9722150Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_uint8 PASSED [0.0446s] [ 44%]
2025-12-04T14:00:07.9722533Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bfloat16 PASSED [0.0724s] [ 44%]
2025-12-04T14:00:07.9722898Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bool PASSED [0.0555s] [ 44%]
2025-12-04T14:00:07.9723296Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex128 PASSED [0.0764s] [ 44%]
2025-12-04T14:00:07.9723681Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex64 PASSED [0.0768s] [ 44%]
2025-12-04T14:00:07.9724061Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float16 PASSED [0.0722s] [ 44%]
2025-12-04T14:00:07.9724434Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float32 PASSED [0.0724s] [ 44%]
2025-12-04T14:00:07.9724856Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float64 PASSED [0.0724s] [ 44%]
2025-12-04T14:00:07.9725264Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int16 PASSED [0.0555s] [ 44%]
2025-12-04T14:00:07.9725632Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int32 PASSED [0.0555s] [ 45%]
2025-12-04T14:00:07.9726001Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int64 PASSED [0.0559s] [ 45%]
2025-12-04T14:00:07.9726364Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int8 PASSED [0.0553s] [ 45%]
2025-12-04T14:00:07.9726735Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_uint8 PASSED [0.0553s] [ 45%]
2025-12-04T14:00:07.9727114Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bfloat16 PASSED [0.0721s] [ 45%]
2025-12-04T14:00:07.9727477Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bool PASSED [0.0553s] [ 45%]
2025-12-04T14:00:07.9727871Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex128 PASSED [0.0761s] [ 45%]
2025-12-04T14:00:07.9728256Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex64 PASSED [0.0767s] [ 45%]
2025-12-04T14:00:07.9728639Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float16 PASSED [0.0723s] [ 45%]
2025-12-04T14:00:07.9729103Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float32 PASSED [0.0722s] [ 45%]
2025-12-04T14:00:07.9729516Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float64 PASSED [0.0722s] [ 45%]
2025-12-04T14:00:07.9729879Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int16 PASSED [0.0554s] [ 45%]
2025-12-04T14:00:07.9730249Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int32 PASSED [0.0554s] [ 45%]
2025-12-04T14:00:07.9730613Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int64 PASSED [0.0556s] [ 45%]
2025-12-04T14:00:07.9730980Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int8 PASSED [0.0552s] [ 46%]
2025-12-04T14:00:07.9731344Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_uint8 PASSED [0.0553s] [ 46%]
2025-12-04T14:00:07.9731724Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bfloat16 PASSED [0.1153s] [ 46%]
2025-12-04T14:00:07.9732092Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bool PASSED [0.0987s] [ 46%]
2025-12-04T14:00:07.9732481Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex128 PASSED [0.1196s] [ 46%]
2025-12-04T14:00:07.9732869Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex64 PASSED [0.1202s] [ 46%]
2025-12-04T14:00:07.9733243Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float16 PASSED [0.1159s] [ 46%]
2025-12-04T14:00:07.9733617Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float32 PASSED [0.1157s] [ 46%]
2025-12-04T14:00:07.9733990Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float64 PASSED [0.1154s] [ 46%]
2025-12-04T14:00:07.9734356Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int16 PASSED [0.0996s] [ 46%]
2025-12-04T14:00:07.9734723Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int32 PASSED [0.0995s] [ 46%]
2025-12-04T14:00:07.9735093Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int64 PASSED [0.1001s] [ 46%]
2025-12-04T14:00:07.9735455Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int8 PASSED [0.0994s] [ 46%]
2025-12-04T14:00:07.9735867Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_uint8 PASSED [0.0994s] [ 46%]
2025-12-04T14:00:07.9736313Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bfloat16 PASSED [0.1154s] [ 47%]
2025-12-04T14:00:07.9736680Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bool PASSED [0.0988s] [ 47%]
2025-12-04T14:00:07.9737072Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex128 PASSED [0.1191s] [ 47%]
2025-12-04T14:00:07.9737455Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex64 PASSED [0.1192s] [ 47%]
2025-12-04T14:00:07.9737833Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float16 PASSED [0.1151s] [ 47%]
2025-12-04T14:00:07.9738204Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float32 PASSED [0.1150s] [ 47%]
2025-12-04T14:00:07.9738588Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float64 PASSED [0.1150s] [ 47%]
2025-12-04T14:00:07.9739003Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int16 PASSED [0.0987s] [ 47%]
2025-12-04T14:00:07.9739435Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int32 PASSED [0.0988s] [ 47%]
2025-12-04T14:00:07.9739804Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int64 PASSED [0.0989s] [ 47%]
2025-12-04T14:00:07.9740212Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int8 PASSED [0.0987s] [ 47%]
2025-12-04T14:00:07.9740622Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_uint8 PASSED [0.0988s] [ 47%]
2025-12-04T14:00:07.9740998Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bfloat16 PASSED [0.1080s] [ 47%]
2025-12-04T14:00:07.9741365Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bool PASSED [0.0900s] [ 47%]
2025-12-04T14:00:07.9741758Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex128 PASSED [0.1096s] [ 48%]
2025-12-04T14:00:07.9742146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex64 PASSED [0.1099s] [ 48%]
2025-12-04T14:00:07.9742522Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float16 PASSED [0.1062s] [ 48%]
2025-12-04T14:00:07.9742895Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float32 PASSED [0.1062s] [ 48%]
2025-12-04T14:00:07.9743269Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float64 PASSED [0.1058s] [ 48%]
2025-12-04T14:00:07.9743636Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int16 PASSED [0.0901s] [ 48%]
2025-12-04T14:00:07.9743999Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int32 PASSED [0.0898s] [ 48%]
2025-12-04T14:00:07.9744369Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int64 PASSED [0.0907s] [ 48%]
2025-12-04T14:00:07.9744734Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int8 PASSED [0.0901s] [ 48%]
2025-12-04T14:00:07.9745098Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_uint8 PASSED [0.0896s] [ 48%]
2025-12-04T14:00:07.9745483Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bfloat16 PASSED [0.1058s] [ 48%]
2025-12-04T14:00:07.9745844Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bool PASSED [0.0897s] [ 48%]
2025-12-04T14:00:07.9746247Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex128 PASSED [0.1092s] [ 48%]
2025-12-04T14:00:07.9746632Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex64 PASSED [0.1098s] [ 48%]
2025-12-04T14:00:07.9747051Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float16 PASSED [0.1055s] [ 49%]
2025-12-04T14:00:07.9747468Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float32 PASSED [0.1053s] [ 49%]
2025-12-04T14:00:07.9747842Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float64 PASSED [0.1051s] [ 49%]
2025-12-04T14:00:07.9748210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int16 PASSED [0.0890s] [ 49%]
2025-12-04T14:00:07.9748577Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int32 PASSED [0.0889s] [ 49%]
2025-12-04T14:00:07.9748944Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int64 PASSED [0.0897s] [ 49%]
2025-12-04T14:00:07.9749312Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int8 PASSED [0.0891s] [ 49%]
2025-12-04T14:00:07.9749680Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_uint8 PASSED [0.0897s] [ 49%]
2025-12-04T14:00:07.9750119Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bfloat16 SKIPPED [0.0028s] (NOT IMPL) [ 49%]
2025-12-04T14:00:07.9750535Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 49%]
2025-12-04T14:00:07.9751017Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 49%]
2025-12-04T14:00:07.9751458Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 49%]
2025-12-04T14:00:07.9751928Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float16 SKIPPED [0.0026s] (NOT IMPL) [ 49%]
2025-12-04T14:00:07.9752355Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9752780Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9753198Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9753616Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9754034Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int64 SKIPPED [0.0025s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9754458Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9754874Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9755305Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9755723Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9756162Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex128 SKIPPED [0.0025s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9756598Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9757023Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9757448Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9757873Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 50%]
2025-12-04T14:00:07.9758337Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int16 SKIPPED [0.0024s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9758798Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9759213Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9759629Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9760047Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9760477Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bfloat16 SKIPPED [0.0025s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9760891Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9761329Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9761771Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9762196Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9762664Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float32 SKIPPED [0.0025s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9763129Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9763543Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9763957Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 51%]
2025-12-04T14:00:07.9764374Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9764789Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int8 SKIPPED [0.0024s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9765206Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9765636Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9766049Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9766490Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9766923Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex64 SKIPPED [0.0024s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9767351Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float16 SKIPPED [0.0020s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9767775Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9768199Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float64 SKIPPED [0.0020s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9768616Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9769031Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int32 SKIPPED [0.0024s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9769445Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9769903Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 52%]
2025-12-04T14:00:07.9770359Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 53%]
2025-12-04T14:00:07.9770742Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bfloat16 PASSED [0.1164s] [ 53%]
2025-12-04T14:00:07.9771108Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bool PASSED [0.0996s] [ 53%]
2025-12-04T14:00:07.9771501Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex128 PASSED [0.1191s] [ 53%]
2025-12-04T14:00:07.9771889Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex64 PASSED [0.1196s] [ 53%]
2025-12-04T14:00:07.9772260Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float16 PASSED [0.1161s] [ 53%]
2025-12-04T14:00:07.9772640Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float32 PASSED [0.1160s] [ 53%]
2025-12-04T14:00:07.9773013Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float64 PASSED [0.1163s] [ 53%]
2025-12-04T14:00:07.9773380Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int16 PASSED [0.0995s] [ 53%]
2025-12-04T14:00:07.9773747Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int32 PASSED [0.0995s] [ 53%]
2025-12-04T14:00:07.9774235Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int64 PASSED [0.0997s] [ 53%]
2025-12-04T14:00:07.9774654Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int8 PASSED [0.0999s] [ 53%]
2025-12-04T14:00:07.9775020Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_uint8 PASSED [0.0997s] [ 53%]
2025-12-04T14:00:07.9775402Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bfloat16 PASSED [0.1157s] [ 53%]
2025-12-04T14:00:07.9775766Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bool PASSED [0.0987s] [ 54%]
2025-12-04T14:00:07.9776157Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex128 PASSED [0.1191s] [ 54%]
2025-12-04T14:00:07.9776545Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex64 PASSED [0.1188s] [ 54%]
2025-12-04T14:00:07.9776918Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float16 PASSED [0.1150s] [ 54%]
2025-12-04T14:00:07.9777294Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float32 PASSED [0.1153s] [ 54%]
2025-12-04T14:00:07.9777664Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float64 PASSED [0.1157s] [ 54%]
2025-12-04T14:00:07.9778029Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int16 PASSED [0.0987s] [ 54%]
2025-12-04T14:00:07.9778396Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int32 PASSED [0.0989s] [ 54%]
2025-12-04T14:00:07.9778762Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int64 PASSED [0.0985s] [ 54%]
2025-12-04T14:00:07.9779182Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int8 PASSED [0.0988s] [ 54%]
2025-12-04T14:00:07.9779550Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_uint8 PASSED [0.0987s] [ 54%]
2025-12-04T14:00:07.9779930Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bfloat16 PASSED [0.0728s] [ 54%]
2025-12-04T14:00:07.9780294Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bool PASSED [0.0553s] [ 54%]
2025-12-04T14:00:07.9780685Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex128 PASSED [0.0763s] [ 54%]
2025-12-04T14:00:07.9781149Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex64 PASSED [0.0762s] [ 55%]
2025-12-04T14:00:07.9781561Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float16 PASSED [0.0723s] [ 55%]
2025-12-04T14:00:07.9781938Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float32 PASSED [0.0724s] [ 55%]
2025-12-04T14:00:07.9782314Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float64 PASSED [0.0727s] [ 55%]
2025-12-04T14:00:07.9782678Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int16 PASSED [0.0553s] [ 55%]
2025-12-04T14:00:07.9783051Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int32 PASSED [0.0553s] [ 55%]
2025-12-04T14:00:07.9783414Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int64 PASSED [0.0555s] [ 55%]
2025-12-04T14:00:07.9783777Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int8 PASSED [0.0554s] [ 55%]
2025-12-04T14:00:07.9784145Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_uint8 PASSED [0.0554s] [ 55%]
2025-12-04T14:00:07.9784523Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bfloat16 PASSED [0.0729s] [ 55%]
2025-12-04T14:00:07.9784884Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bool PASSED [0.0555s] [ 55%]
2025-12-04T14:00:07.9785315Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex128 PASSED [0.0763s] [ 55%]
2025-12-04T14:00:07.9785736Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex64 PASSED [0.0761s] [ 55%]
2025-12-04T14:00:07.9786113Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float16 PASSED [0.0722s] [ 55%]
2025-12-04T14:00:07.9786486Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float32 PASSED [0.0723s] [ 56%]
2025-12-04T14:00:07.9786859Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float64 PASSED [0.0726s] [ 56%]
2025-12-04T14:00:07.9787226Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int16 PASSED [0.0552s] [ 56%]
2025-12-04T14:00:07.9787589Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int32 PASSED [0.0554s] [ 56%]
2025-12-04T14:00:07.9787957Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int64 PASSED [0.0552s] [ 56%]
2025-12-04T14:00:07.9788321Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int8 PASSED [0.0551s] [ 56%]
2025-12-04T14:00:07.9788736Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_uint8 PASSED [0.0552s] [ 56%]
2025-12-04T14:00:07.9789117Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bfloat16 PASSED [0.1079s] [ 56%]
2025-12-04T14:00:07.9789479Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bool PASSED [0.0915s] [ 56%]
2025-12-04T14:00:07.9789872Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex128 PASSED [0.1112s] [ 56%]
2025-12-04T14:00:07.9790256Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex64 PASSED [0.1113s] [ 56%]
2025-12-04T14:00:07.9790628Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float16 PASSED [0.1079s] [ 56%]
2025-12-04T14:00:07.9791003Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float32 PASSED [0.1077s] [ 56%]
2025-12-04T14:00:07.9791373Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float64 PASSED [0.1082s] [ 56%]
2025-12-04T14:00:07.9791738Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int16 PASSED [0.0917s] [ 57%]
2025-12-04T14:00:07.9792146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int32 PASSED [0.0916s] [ 57%]
2025-12-04T14:00:07.9792550Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int64 PASSED [0.0917s] [ 57%]
2025-12-04T14:00:07.9792915Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int8 PASSED [0.0916s] [ 57%]
2025-12-04T14:00:07.9793281Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_uint8 PASSED [0.0918s] [ 57%]
2025-12-04T14:00:07.9793659Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bfloat16 PASSED [0.1081s] [ 57%]
2025-12-04T14:00:07.9794023Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bool PASSED [0.0912s] [ 57%]
2025-12-04T14:00:07.9794411Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex128 PASSED [0.1110s] [ 57%]
2025-12-04T14:00:07.9794799Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex64 PASSED [0.1102s] [ 57%]
2025-12-04T14:00:07.9795172Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float16 PASSED [0.1067s] [ 57%]
2025-12-04T14:00:07.9795548Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float32 PASSED [0.1066s] [ 57%]
2025-12-04T14:00:07.9795960Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float64 PASSED [0.1076s] [ 57%]
2025-12-04T14:00:07.9796326Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int16 PASSED [0.0914s] [ 57%]
2025-12-04T14:00:07.9796737Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int32 PASSED [0.0912s] [ 57%]
2025-12-04T14:00:07.9797099Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int64 PASSED [0.0907s] [ 58%]
2025-12-04T14:00:07.9797466Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int8 PASSED [0.0914s] [ 58%]
2025-12-04T14:00:07.9797833Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_uint8 PASSED [0.0909s] [ 58%]
2025-12-04T14:00:07.9798264Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bfloat16 SKIPPED [0.0023s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9798720Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bool SKIPPED [0.0030s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9799174Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9799614Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9800037Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9800461Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9800888Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float64 SKIPPED [0.0025s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9801301Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9801719Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9802131Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9802546Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 58%]
2025-12-04T14:00:07.9802963Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_uint8 SKIPPED [0.0025s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9803435Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9803889Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9804328Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9804763Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9805190Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float16 SKIPPED [0.0025s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9805614Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9806041Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9806457Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9806870Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9807287Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int64 SKIPPED [0.0024s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9808518Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9809058Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9809488Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 59%]
2025-12-04T14:00:07.9809903Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9810343Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex128 SKIPPED [0.0024s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9810775Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9811203Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9811625Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9812053Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9812473Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int16 SKIPPED [0.0025s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9812890Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9813307Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int64 SKIPPED [0.0020s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9813718Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9814136Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9814570Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bfloat16 SKIPPED [0.0024s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9814982Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9815514Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex128 SKIPPED [0.0020s] (NOT IMPL) [ 60%]
2025-12-04T14:00:07.9816031Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9816487Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9816942Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float32 SKIPPED [0.0024s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9817397Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9817845Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9818289Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9818735Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9819229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int8 SKIPPED [0.0025s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9819647Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 61%]
2025-12-04T14:00:07.9820099Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bfloat16 PASSED [0.0717s] [ 61%]
2025-12-04T14:00:07.9820467Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bool PASSED [0.0633s] [ 61%]
2025-12-04T14:00:07.9820928Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex128 PASSED [0.0732s] [ 61%]
2025-12-04T14:00:07.9821327Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex64 PASSED [0.0737s] [ 61%]
2025-12-04T14:00:07.9821708Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float16 PASSED [0.0715s] [ 61%]
2025-12-04T14:00:07.9822093Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float32 PASSED [0.0715s] [ 62%]
2025-12-04T14:00:07.9822467Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float64 PASSED [0.0713s] [ 62%]
2025-12-04T14:00:07.9822838Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int16 PASSED [0.0630s] [ 62%]
2025-12-04T14:00:07.9823218Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int32 PASSED [0.0630s] [ 62%]
2025-12-04T14:00:07.9823590Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int64 PASSED [0.0636s] [ 62%]
2025-12-04T14:00:07.9823967Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int8 PASSED [0.0630s] [ 62%]
2025-12-04T14:00:07.9824336Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_uint8 PASSED [0.0631s] [ 62%]
2025-12-04T14:00:07.9824724Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bfloat16 PASSED [0.0708s] [ 62%]
2025-12-04T14:00:07.9825105Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bool PASSED [0.0626s] [ 62%]
2025-12-04T14:00:07.9825509Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex128 PASSED [0.0727s] [ 62%]
2025-12-04T14:00:07.9825904Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex64 PASSED [0.0735s] [ 62%]
2025-12-04T14:00:07.9826280Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float16 PASSED [0.0708s] [ 62%]
2025-12-04T14:00:07.9826656Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float32 PASSED [0.0708s] [ 62%]
2025-12-04T14:00:07.9827038Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float64 PASSED [0.0708s] [ 63%]
2025-12-04T14:00:07.9827453Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int16 PASSED [0.0623s] [ 63%]
2025-12-04T14:00:07.9827870Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int32 PASSED [0.0625s] [ 63%]
2025-12-04T14:00:07.9828239Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int64 PASSED [0.0631s] [ 63%]
2025-12-04T14:00:07.9828605Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int8 PASSED [0.0624s] [ 63%]
2025-12-04T14:00:07.9828978Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_uint8 PASSED [0.0627s] [ 63%]
2025-12-04T14:00:07.9829362Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bfloat16 PASSED [0.0679s] [ 63%]
2025-12-04T14:00:07.9829731Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bool PASSED [0.0596s] [ 63%]
2025-12-04T14:00:07.9830129Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex128 PASSED [0.0697s] [ 63%]
2025-12-04T14:00:07.9830519Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex64 PASSED [0.0701s] [ 63%]
2025-12-04T14:00:07.9830903Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float16 PASSED [0.0676s] [ 63%]
2025-12-04T14:00:07.9831319Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float32 PASSED [0.0677s] [ 63%]
2025-12-04T14:00:07.9831701Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float64 PASSED [0.0676s] [ 63%]
2025-12-04T14:00:07.9832107Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int16 PASSED [0.0595s] [ 63%]
2025-12-04T14:00:07.9832478Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int32 PASSED [0.0593s] [ 64%]
2025-12-04T14:00:07.9832855Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int64 PASSED [0.0599s] [ 64%]
2025-12-04T14:00:07.9833223Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int8 PASSED [0.0594s] [ 64%]
2025-12-04T14:00:07.9833600Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_uint8 PASSED [0.0594s] [ 64%]
2025-12-04T14:00:07.9833979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bfloat16 PASSED [0.0672s] [ 64%]
2025-12-04T14:00:07.9834345Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bool PASSED [0.0590s] [ 64%]
2025-12-04T14:00:07.9834746Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex128 PASSED [0.0689s] [ 64%]
2025-12-04T14:00:07.9835134Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex64 PASSED [0.0692s] [ 64%]
2025-12-04T14:00:07.9835515Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float16 PASSED [0.0671s] [ 64%]
2025-12-04T14:00:07.9835894Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float32 PASSED [0.0673s] [ 64%]
2025-12-04T14:00:07.9836269Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float64 PASSED [0.0671s] [ 64%]
2025-12-04T14:00:07.9836648Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int16 PASSED [0.0590s] [ 64%]
2025-12-04T14:00:07.9837020Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int32 PASSED [0.0590s] [ 64%]
2025-12-04T14:00:07.9837393Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int64 PASSED [0.0593s] [ 64%]
2025-12-04T14:00:07.9837759Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int8 PASSED [0.0589s] [ 65%]
2025-12-04T14:00:07.9838125Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_uint8 PASSED [0.0590s] [ 65%]
2025-12-04T14:00:07.9838562Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bfloat16 PASSED [0.0618s] [ 65%]
2025-12-04T14:00:07.9839015Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bool PASSED [0.0470s] [ 65%]
2025-12-04T14:00:07.9839413Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex128 PASSED [0.0652s] [ 65%]
2025-12-04T14:00:07.9839802Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex64 PASSED [0.0657s] [ 65%]
2025-12-04T14:00:07.9840177Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float16 PASSED [0.0618s] [ 65%]
2025-12-04T14:00:07.9840558Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float32 PASSED [0.0617s] [ 65%]
2025-12-04T14:00:07.9840931Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float64 PASSED [0.0617s] [ 65%]
2025-12-04T14:00:07.9841307Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int16 PASSED [0.0470s] [ 65%]
2025-12-04T14:00:07.9841677Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int32 PASSED [0.0468s] [ 65%]
2025-12-04T14:00:07.9842043Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int64 PASSED [0.0472s] [ 65%]
2025-12-04T14:00:07.9842466Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int8 PASSED [0.0467s] [ 65%]
2025-12-04T14:00:07.9842835Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_uint8 PASSED [0.0469s] [ 65%]
2025-12-04T14:00:07.9843262Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bfloat16 PASSED [0.0614s] [ 66%]
2025-12-04T14:00:07.9843631Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bool PASSED [0.0462s] [ 66%]
2025-12-04T14:00:07.9844030Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex128 PASSED [0.0647s] [ 66%]
2025-12-04T14:00:07.9844422Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex64 PASSED [0.0652s] [ 66%]
2025-12-04T14:00:07.9844797Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float16 PASSED [0.0613s] [ 66%]
2025-12-04T14:00:07.9845172Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float32 PASSED [0.0613s] [ 66%]
2025-12-04T14:00:07.9845555Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float64 PASSED [0.0612s] [ 66%]
2025-12-04T14:00:07.9845923Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int16 PASSED [0.0462s] [ 66%]
2025-12-04T14:00:07.9846299Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int32 PASSED [0.0464s] [ 66%]
2025-12-04T14:00:07.9846674Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int64 PASSED [0.0467s] [ 66%]
2025-12-04T14:00:07.9847039Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int8 PASSED [0.0461s] [ 66%]
2025-12-04T14:00:07.9847419Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_uint8 PASSED [0.0462s] [ 66%]
2025-12-04T14:00:07.9847800Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bfloat16 PASSED [0.0630s] [ 66%]
2025-12-04T14:00:07.9848173Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bool PASSED [0.0548s] [ 66%]
2025-12-04T14:00:07.9848572Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex128 PASSED [0.0647s] [ 67%]
2025-12-04T14:00:07.9848958Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex64 PASSED [0.0653s] [ 67%]
2025-12-04T14:00:07.9849341Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float16 PASSED [0.0630s] [ 67%]
2025-12-04T14:00:07.9849761Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float32 PASSED [0.0630s] [ 67%]
2025-12-04T14:00:07.9850179Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float64 PASSED [0.0629s] [ 67%]
2025-12-04T14:00:07.9850546Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int16 PASSED [0.0548s] [ 67%]
2025-12-04T14:00:07.9850913Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int32 PASSED [0.0548s] [ 67%]
2025-12-04T14:00:07.9851288Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int64 PASSED [0.0555s] [ 67%]
2025-12-04T14:00:07.9851656Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int8 PASSED [0.0547s] [ 67%]
2025-12-04T14:00:07.9852026Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_uint8 PASSED [0.0547s] [ 67%]
2025-12-04T14:00:07.9852408Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bfloat16 PASSED [0.0625s] [ 67%]
2025-12-04T14:00:07.9852775Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bool PASSED [0.0542s] [ 67%]
2025-12-04T14:00:07.9853171Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex128 PASSED [0.0642s] [ 67%]
2025-12-04T14:00:07.9853599Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex64 PASSED [0.0650s] [ 67%]
2025-12-04T14:00:07.9853978Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float16 PASSED [0.0625s] [ 68%]
2025-12-04T14:00:07.9854531Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float32 PASSED [0.0625s] [ 68%]
2025-12-04T14:00:07.9854906Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float64 PASSED [0.0625s] [ 68%]
2025-12-04T14:00:07.9855281Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int16 PASSED [0.0542s] [ 68%]
2025-12-04T14:00:07.9855651Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int32 PASSED [0.0542s] [ 68%]
2025-12-04T14:00:07.9856022Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int64 PASSED [0.0545s] [ 68%]
2025-12-04T14:00:07.9856385Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int8 PASSED [0.0541s] [ 68%]
2025-12-04T14:00:07.9856755Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_uint8 PASSED [0.0541s] [ 68%]
2025-12-04T14:00:07.9857146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bfloat16 PASSED [0.0593s] [ 68%]
2025-12-04T14:00:07.9857509Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bool PASSED [0.0508s] [ 68%]
2025-12-04T14:00:07.9857908Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex128 PASSED [0.0610s] [ 68%]
2025-12-04T14:00:07.9858296Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex64 PASSED [0.0614s] [ 68%]
2025-12-04T14:00:07.9858699Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float16 PASSED [0.0593s] [ 68%]
2025-12-04T14:00:07.9859148Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float32 PASSED [0.0592s] [ 68%]
2025-12-04T14:00:07.9859526Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float64 PASSED [0.0591s] [ 69%]
2025-12-04T14:00:07.9859901Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int16 PASSED [0.0510s] [ 69%]
2025-12-04T14:00:07.9860268Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int32 PASSED [0.0509s] [ 69%]
2025-12-04T14:00:07.9860633Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int64 PASSED [0.0513s] [ 69%]
2025-12-04T14:00:07.9861050Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int8 PASSED [0.0509s] [ 69%]
2025-12-04T14:00:07.9861460Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_uint8 PASSED [0.0510s] [ 69%]
2025-12-04T14:00:07.9861844Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bfloat16 PASSED [0.0588s] [ 69%]
2025-12-04T14:00:07.9862210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bool PASSED [0.0505s] [ 69%]
2025-12-04T14:00:07.9862604Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex128 PASSED [0.0605s] [ 69%]
2025-12-04T14:00:07.9862999Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex64 PASSED [0.0610s] [ 69%]
2025-12-04T14:00:07.9863371Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float16 PASSED [0.0587s] [ 69%]
2025-12-04T14:00:07.9863751Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float32 PASSED [0.0587s] [ 69%]
2025-12-04T14:00:07.9864124Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float64 PASSED [0.0586s] [ 69%]
2025-12-04T14:00:07.9864489Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int16 PASSED [0.0505s] [ 69%]
2025-12-04T14:00:07.9864927Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int32 PASSED [0.0504s] [ 70%]
2025-12-04T14:00:07.9865295Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int64 PASSED [0.0508s] [ 70%]
2025-12-04T14:00:07.9865699Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int8 PASSED [0.0504s] [ 70%]
2025-12-04T14:00:07.9866069Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_uint8 PASSED [0.0505s] [ 70%]
2025-12-04T14:00:07.9866450Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bfloat16 PASSED [0.0603s] [ 70%]
2025-12-04T14:00:07.9866824Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bool PASSED [0.0517s] [ 70%]
2025-12-04T14:00:07.9867215Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex128 PASSED [0.0619s] [ 70%]
2025-12-04T14:00:07.9867601Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex64 PASSED [0.0624s] [ 70%]
2025-12-04T14:00:07.9867979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float16 PASSED [0.0600s] [ 70%]
2025-12-04T14:00:07.9868364Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float32 PASSED [0.0599s] [ 70%]
2025-12-04T14:00:07.9868787Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float64 PASSED [0.0600s] [ 70%]
2025-12-04T14:00:07.9869155Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int16 PASSED [0.0517s] [ 70%]
2025-12-04T14:00:07.9869524Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int32 PASSED [0.0515s] [ 70%]
2025-12-04T14:00:07.9869898Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int64 PASSED [0.0521s] [ 70%]
2025-12-04T14:00:07.9870262Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int8 PASSED [0.0516s] [ 71%]
2025-12-04T14:00:07.9870635Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_uint8 PASSED [0.0517s] [ 71%]
2025-12-04T14:00:07.9871016Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bfloat16 PASSED [0.0599s] [ 71%]
2025-12-04T14:00:07.9871381Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bool PASSED [0.0516s] [ 71%]
2025-12-04T14:00:07.9871781Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex128 PASSED [0.0617s] [ 71%]
2025-12-04T14:00:07.9872219Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex64 PASSED [0.0622s] [ 71%]
2025-12-04T14:00:07.9872642Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float16 PASSED [0.0598s] [ 71%]
2025-12-04T14:00:07.9873016Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float32 PASSED [0.0599s] [ 71%]
2025-12-04T14:00:07.9873391Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float64 PASSED [0.0599s] [ 71%]
2025-12-04T14:00:07.9873761Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int16 PASSED [0.0516s] [ 71%]
2025-12-04T14:00:07.9874129Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int32 PASSED [0.0516s] [ 71%]
2025-12-04T14:00:07.9874499Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int64 PASSED [0.0520s] [ 71%]
2025-12-04T14:00:07.9874866Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int8 PASSED [0.0516s] [ 71%]
2025-12-04T14:00:07.9875230Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_uint8 PASSED [0.0516s] [ 71%]
2025-12-04T14:00:07.9875618Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bfloat16 PASSED [0.0725s] [ 72%]
2025-12-04T14:00:07.9876028Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bool PASSED [0.0642s] [ 72%]
2025-12-04T14:00:07.9876429Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex128 PASSED [0.0743s] [ 72%]
2025-12-04T14:00:07.9876854Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex64 PASSED [0.0748s] [ 72%]
2025-12-04T14:00:07.9877229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float16 PASSED [0.0724s] [ 72%]
2025-12-04T14:00:07.9877609Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float32 PASSED [0.0724s] [ 72%]
2025-12-04T14:00:07.9877984Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float64 PASSED [0.0724s] [ 72%]
2025-12-04T14:00:07.9878358Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int16 PASSED [0.0642s] [ 72%]
2025-12-04T14:00:07.9878777Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int32 PASSED [0.0640s] [ 72%]
2025-12-04T14:00:07.9879143Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int64 PASSED [0.0644s] [ 72%]
2025-12-04T14:00:07.9879515Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int8 PASSED [0.0641s] [ 72%]
2025-12-04T14:00:07.9879880Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_uint8 PASSED [0.0641s] [ 72%]
2025-12-04T14:00:07.9880273Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bfloat16 PASSED [0.0719s] [ 72%]
2025-12-04T14:00:07.9880638Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bool PASSED [0.0635s] [ 72%]
2025-12-04T14:00:07.9881032Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex128 PASSED [0.0736s] [ 73%]
2025-12-04T14:00:07.9881429Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex64 PASSED [0.0744s] [ 73%]
2025-12-04T14:00:07.9881806Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float16 PASSED [0.0717s] [ 73%]
2025-12-04T14:00:07.9882189Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float32 PASSED [0.0719s] [ 73%]
2025-12-04T14:00:07.9882565Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float64 PASSED [0.0719s] [ 73%]
2025-12-04T14:00:07.9882979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int16 PASSED [0.0635s] [ 73%]
2025-12-04T14:00:07.9883355Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int32 PASSED [0.0636s] [ 73%]
2025-12-04T14:00:07.9883762Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int64 PASSED [0.0638s] [ 73%]
2025-12-04T14:00:07.9884132Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int8 PASSED [0.0635s] [ 73%]
2025-12-04T14:00:07.9884505Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_uint8 PASSED [0.0634s] [ 73%]
2025-12-04T14:00:07.9884890Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bfloat16 PASSED [0.0810s] [ 73%]
2025-12-04T14:00:07.9885258Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bool PASSED [0.0661s] [ 73%]
2025-12-04T14:00:07.9885653Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex128 PASSED [0.0840s] [ 73%]
2025-12-04T14:00:07.9886047Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex64 PASSED [0.0846s] [ 73%]
2025-12-04T14:00:07.9886427Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float16 PASSED [0.0808s] [ 74%]
2025-12-04T14:00:07.9886802Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float32 PASSED [0.0806s] [ 74%]
2025-12-04T14:00:07.9887224Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float64 PASSED [0.0806s] [ 74%]
2025-12-04T14:00:07.9887596Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int16 PASSED [0.0660s] [ 74%]
2025-12-04T14:00:07.9888005Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int32 PASSED [0.0660s] [ 74%]
2025-12-04T14:00:07.9888380Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int64 PASSED [0.0665s] [ 74%]
2025-12-04T14:00:07.9888748Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int8 PASSED [0.0661s] [ 74%]
2025-12-04T14:00:07.9889123Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_uint8 PASSED [0.0661s] [ 74%]
2025-12-04T14:00:07.9889502Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bfloat16 PASSED [0.0806s] [ 74%]
2025-12-04T14:00:07.9889867Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bool PASSED [0.0658s] [ 74%]
2025-12-04T14:00:07.9890264Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex128 PASSED [0.0832s] [ 74%]
2025-12-04T14:00:07.9890652Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex64 PASSED [0.0841s] [ 74%]
2025-12-04T14:00:07.9891035Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float16 PASSED [0.0806s] [ 74%]
2025-12-04T14:00:07.9891412Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float32 PASSED [0.0807s] [ 75%]
2025-12-04T14:00:07.9891788Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float64 PASSED [0.0808s] [ 75%]
2025-12-04T14:00:07.9892163Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int16 PASSED [0.0662s] [ 75%]
2025-12-04T14:00:07.9892535Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int32 PASSED [0.0660s] [ 75%]
2025-12-04T14:00:07.9892907Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int64 PASSED [0.0664s] [ 75%]
2025-12-04T14:00:07.9893275Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int8 PASSED [0.0655s] [ 75%]
2025-12-04T14:00:07.9893644Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_uint8 PASSED [0.0659s] [ 75%]
2025-12-04T14:00:07.9894029Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bfloat16 PASSED [0.0657s] [ 75%]
2025-12-04T14:00:07.9894438Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bool PASSED [0.0506s] [ 75%]
2025-12-04T14:00:07.9894878Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex128 PASSED [0.0691s] [ 75%]
2025-12-04T14:00:07.9895268Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex64 PASSED [0.0696s] [ 75%]
2025-12-04T14:00:07.9895644Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float16 PASSED [0.0658s] [ 75%]
2025-12-04T14:00:07.9896026Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float32 PASSED [0.0659s] [ 75%]
2025-12-04T14:00:07.9896399Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float64 PASSED [0.0657s] [ 75%]
2025-12-04T14:00:07.9896774Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int16 PASSED [0.0505s] [ 76%]
2025-12-04T14:00:07.9897146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int32 PASSED [0.0503s] [ 76%]
2025-12-04T14:00:07.9897515Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int64 PASSED [0.0508s] [ 76%]
2025-12-04T14:00:07.9897883Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int8 PASSED [0.0504s] [ 76%]
2025-12-04T14:00:07.9898293Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_uint8 PASSED [0.0506s] [ 76%]
2025-12-04T14:00:07.9898723Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bfloat16 PASSED [0.0657s] [ 76%]
2025-12-04T14:00:07.9899179Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bool PASSED [0.0503s] [ 76%]
2025-12-04T14:00:07.9899576Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex128 PASSED [0.0692s] [ 76%]
2025-12-04T14:00:07.9899973Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex64 PASSED [0.0696s] [ 76%]
2025-12-04T14:00:07.9900349Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float16 PASSED [0.0656s] [ 76%]
2025-12-04T14:00:07.9900729Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float32 PASSED [0.0656s] [ 76%]
2025-12-04T14:00:07.9901106Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float64 PASSED [0.0655s] [ 76%]
2025-12-04T14:00:07.9901472Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int16 PASSED [0.0502s] [ 76%]
2025-12-04T14:00:07.9901846Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int32 PASSED [0.0498s] [ 76%]
2025-12-04T14:00:07.9902210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int64 PASSED [0.0506s] [ 77%]
2025-12-04T14:00:07.9902580Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int8 PASSED [0.0502s] [ 77%]
2025-12-04T14:00:07.9902949Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_uint8 PASSED [0.0503s] [ 77%]
2025-12-04T14:00:07.9903327Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bfloat16 PASSED [0.1075s] [ 77%]
2025-12-04T14:00:07.9903697Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bool PASSED [0.0925s] [ 77%]
2025-12-04T14:00:07.9904090Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex128 PASSED [0.1107s] [ 77%]
2025-12-04T14:00:07.9904489Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex64 PASSED [0.1111s] [ 77%]
2025-12-04T14:00:07.9904864Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float16 PASSED [0.1072s] [ 77%]
2025-12-04T14:00:07.9905300Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float32 PASSED [0.1075s] [ 77%]
2025-12-04T14:00:07.9905677Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float64 PASSED [0.1071s] [ 77%]
2025-12-04T14:00:07.9906120Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int16 PASSED [0.0923s] [ 77%]
2025-12-04T14:00:07.9906496Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int32 PASSED [0.0919s] [ 77%]
2025-12-04T14:00:07.9906864Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int64 PASSED [0.0926s] [ 77%]
2025-12-04T14:00:07.9907229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int8 PASSED [0.0920s] [ 77%]
2025-12-04T14:00:07.9907600Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_uint8 PASSED [0.0921s] [ 78%]
2025-12-04T14:00:07.9908239Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bfloat16 PASSED [0.1068s] [ 78%]
2025-12-04T14:00:07.9908632Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bool PASSED [0.0917s] [ 78%]
2025-12-04T14:00:07.9909029Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex128 PASSED [0.1101s] [ 78%]
2025-12-04T14:00:07.9909418Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex64 PASSED [0.1107s] [ 78%]
2025-12-04T14:00:07.9909883Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float16 PASSED [0.1062s] [ 78%]
2025-12-04T14:00:07.9910258Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float32 PASSED [0.1065s] [ 78%]
2025-12-04T14:00:07.9910685Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float64 PASSED [0.1066s] [ 78%]
2025-12-04T14:00:07.9911056Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int16 PASSED [0.0912s] [ 78%]
2025-12-04T14:00:07.9911426Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int32 PASSED [0.0917s] [ 78%]
2025-12-04T14:00:07.9911797Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int64 PASSED [0.0919s] [ 78%]
2025-12-04T14:00:07.9912161Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int8 PASSED [0.0916s] [ 78%]
2025-12-04T14:00:07.9912526Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_uint8 PASSED [0.0917s] [ 78%]
2025-12-04T14:00:07.9912910Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bfloat16 PASSED [0.0726s] [ 78%]
2025-12-04T14:00:07.9913275Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bool PASSED [0.0642s] [ 79%]
2025-12-04T14:00:07.9913673Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex128 PASSED [0.0744s] [ 79%]
2025-12-04T14:00:07.9914063Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex64 PASSED [0.0749s] [ 79%]
2025-12-04T14:00:07.9914442Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float16 PASSED [0.0725s] [ 79%]
2025-12-04T14:00:07.9914821Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float32 PASSED [0.0726s] [ 79%]
2025-12-04T14:00:07.9915193Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float64 PASSED [0.0724s] [ 79%]
2025-12-04T14:00:07.9915574Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int16 PASSED [0.0643s] [ 79%]
2025-12-04T14:00:07.9915942Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int32 PASSED [0.0640s] [ 79%]
2025-12-04T14:00:07.9916309Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int64 PASSED [0.0646s] [ 79%]
2025-12-04T14:00:07.9916735Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int8 PASSED [0.0640s] [ 79%]
2025-12-04T14:00:07.9917104Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_uint8 PASSED [0.0640s] [ 79%]
2025-12-04T14:00:07.9917544Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bfloat16 PASSED [0.0718s] [ 79%]
2025-12-04T14:00:07.9917910Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bool PASSED [0.0635s] [ 79%]
2025-12-04T14:00:07.9918305Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex128 PASSED [0.0736s] [ 79%]
2025-12-04T14:00:07.9918751Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex64 PASSED [0.0742s] [ 80%]
2025-12-04T14:00:07.9919128Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float16 PASSED [0.0718s] [ 80%]
2025-12-04T14:00:07.9919511Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float32 PASSED [0.0718s] [ 80%]
2025-12-04T14:00:07.9919890Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float64 PASSED [0.0718s] [ 80%]
2025-12-04T14:00:07.9920256Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int16 PASSED [0.0633s] [ 80%]
2025-12-04T14:00:07.9920627Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int32 PASSED [0.0635s] [ 80%]
2025-12-04T14:00:07.9921033Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int64 PASSED [0.0639s] [ 80%]
2025-12-04T14:00:07.9921403Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int8 PASSED [0.0634s] [ 80%]
2025-12-04T14:00:07.9921809Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_uint8 PASSED [0.0635s] [ 80%]
2025-12-04T14:00:07.9922189Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bfloat16 PASSED [0.0611s] [ 80%]
2025-12-04T14:00:07.9922560Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bool PASSED [0.0526s] [ 80%]
2025-12-04T14:00:07.9922959Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex128 PASSED [0.0629s] [ 80%]
2025-12-04T14:00:07.9923352Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex64 PASSED [0.0635s] [ 80%]
2025-12-04T14:00:07.9923729Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float16 PASSED [0.0611s] [ 80%]
2025-12-04T14:00:07.9924101Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float32 PASSED [0.0611s] [ 81%]
2025-12-04T14:00:07.9924485Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float64 PASSED [0.0612s] [ 81%]
2025-12-04T14:00:07.9924853Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int16 PASSED [0.0527s] [ 81%]
2025-12-04T14:00:07.9925234Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int32 PASSED [0.0526s] [ 81%]
2025-12-04T14:00:07.9925606Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int64 PASSED [0.0532s] [ 81%]
2025-12-04T14:00:07.9925969Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int8 PASSED [0.0526s] [ 81%]
2025-12-04T14:00:07.9926343Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_uint8 PASSED [0.0527s] [ 81%]
2025-12-04T14:00:07.9926725Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bfloat16 PASSED [0.0610s] [ 81%]
2025-12-04T14:00:07.9927095Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bool PASSED [0.0527s] [ 81%]
2025-12-04T14:00:07.9927484Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex128 PASSED [0.0629s] [ 81%]
2025-12-04T14:00:07.9927918Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex64 PASSED [0.0636s] [ 81%]
2025-12-04T14:00:07.9928302Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float16 PASSED [0.0610s] [ 81%]
2025-12-04T14:00:07.9928749Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float32 PASSED [0.0611s] [ 81%]
2025-12-04T14:00:07.9929145Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float64 PASSED [0.0609s] [ 81%]
2025-12-04T14:00:07.9929512Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int16 PASSED [0.0526s] [ 82%]
2025-12-04T14:00:07.9929885Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int32 PASSED [0.0526s] [ 82%]
2025-12-04T14:00:07.9930255Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int64 PASSED [0.0530s] [ 82%]
2025-12-04T14:00:07.9930624Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int8 PASSED [0.0525s] [ 82%]
2025-12-04T14:00:07.9930995Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_uint8 PASSED [0.0525s] [ 82%]
2025-12-04T14:00:07.9931375Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bfloat16 PASSED [0.1002s] [ 82%]
2025-12-04T14:00:07.9931739Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bool PASSED [0.0861s] [ 82%]
2025-12-04T14:00:07.9932182Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex128 PASSED [0.1040s] [ 82%]
2025-12-04T14:00:07.9932571Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex64 PASSED [0.1045s] [ 82%]
2025-12-04T14:00:07.9932984Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float16 PASSED [0.1009s] [ 82%]
2025-12-04T14:00:07.9933368Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float32 PASSED [0.1007s] [ 82%]
2025-12-04T14:00:07.9933744Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float64 PASSED [0.1007s] [ 82%]
2025-12-04T14:00:07.9934116Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int16 PASSED [0.0856s] [ 82%]
2025-12-04T14:00:07.9934480Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int32 PASSED [0.0859s] [ 82%]
2025-12-04T14:00:07.9934849Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int64 PASSED [0.0864s] [ 83%]
2025-12-04T14:00:07.9935217Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int8 PASSED [0.0861s] [ 83%]
2025-12-04T14:00:07.9935586Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_uint8 PASSED [0.0856s] [ 83%]
2025-12-04T14:00:07.9935970Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bfloat16 PASSED [0.1006s] [ 83%]
2025-12-04T14:00:07.9936343Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bool PASSED [0.0859s] [ 83%]
2025-12-04T14:00:07.9936737Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex128 PASSED [0.1039s] [ 83%]
2025-12-04T14:00:07.9937132Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex64 PASSED [0.1047s] [ 83%]
2025-12-04T14:00:07.9937506Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float16 PASSED [0.1007s] [ 83%]
2025-12-04T14:00:07.9937887Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float32 PASSED [0.1002s] [ 83%]
2025-12-04T14:00:07.9938266Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float64 PASSED [0.1004s] [ 83%]
2025-12-04T14:00:07.9938658Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int16 PASSED [0.0859s] [ 83%]
2025-12-04T14:00:07.9939160Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int32 PASSED [0.0860s] [ 83%]
2025-12-04T14:00:07.9939527Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int64 PASSED [0.0866s] [ 83%]
2025-12-04T14:00:07.9939952Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int8 PASSED [0.0859s] [ 83%]
2025-12-04T14:00:07.9940321Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_uint8 PASSED [0.0858s] [ 84%]
2025-12-04T14:00:07.9940702Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bfloat16 PASSED [0.1074s] [ 84%]
2025-12-04T14:00:07.9941076Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bool PASSED [0.0922s] [ 84%]
2025-12-04T14:00:07.9941468Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex128 PASSED [0.1106s] [ 84%]
2025-12-04T14:00:07.9941866Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex64 PASSED [0.1110s] [ 84%]
2025-12-04T14:00:07.9942242Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float16 PASSED [0.1072s] [ 84%]
2025-12-04T14:00:07.9942617Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float32 PASSED [0.1068s] [ 84%]
2025-12-04T14:00:07.9942999Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float64 PASSED [0.1073s] [ 84%]
2025-12-04T14:00:07.9943408Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int16 PASSED [0.0919s] [ 84%]
2025-12-04T14:00:07.9943784Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int32 PASSED [0.0921s] [ 84%]
2025-12-04T14:00:07.9944190Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int64 PASSED [0.0928s] [ 84%]
2025-12-04T14:00:07.9944553Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int8 PASSED [0.0920s] [ 84%]
2025-12-04T14:00:07.9944930Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_uint8 PASSED [0.0922s] [ 84%]
2025-12-04T14:00:07.9945308Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bfloat16 PASSED [0.1066s] [ 84%]
2025-12-04T14:00:07.9945686Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bool PASSED [0.0917s] [ 85%]
2025-12-04T14:00:07.9946080Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex128 PASSED [0.1098s] [ 85%]
2025-12-04T14:00:07.9946466Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex64 PASSED [0.1104s] [ 85%]
2025-12-04T14:00:07.9946852Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float16 PASSED [0.1063s] [ 85%]
2025-12-04T14:00:07.9947225Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float32 PASSED [0.1064s] [ 85%]
2025-12-04T14:00:07.9947604Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float64 PASSED [0.1063s] [ 85%]
2025-12-04T14:00:07.9947972Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int16 PASSED [0.0915s] [ 85%]
2025-12-04T14:00:07.9948338Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int32 PASSED [0.0917s] [ 85%]
2025-12-04T14:00:07.9948741Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int64 PASSED [0.0919s] [ 85%]
2025-12-04T14:00:07.9949125Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int8 PASSED [0.0916s] [ 85%]
2025-12-04T14:00:07.9949498Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_uint8 PASSED [0.0917s] [ 85%]
2025-12-04T14:00:07.9949877Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bfloat16 PASSED [0.0658s] [ 85%]
2025-12-04T14:00:07.9950312Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bool PASSED [0.0503s] [ 85%]
2025-12-04T14:00:07.9950707Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex128 PASSED [0.0691s] [ 85%]
2025-12-04T14:00:07.9951210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex64 PASSED [0.0696s] [ 86%]
2025-12-04T14:00:07.9951592Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float16 PASSED [0.0657s] [ 86%]
2025-12-04T14:00:07.9951968Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float32 PASSED [0.0657s] [ 86%]
2025-12-04T14:00:07.9952342Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float64 PASSED [0.0655s] [ 86%]
2025-12-04T14:00:07.9952715Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int16 PASSED [0.0504s] [ 86%]
2025-12-04T14:00:07.9953079Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int32 PASSED [0.0503s] [ 86%]
2025-12-04T14:00:07.9953452Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int64 PASSED [0.0509s] [ 86%]
2025-12-04T14:00:07.9953818Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int8 PASSED [0.0503s] [ 86%]
2025-12-04T14:00:07.9954186Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_uint8 PASSED [0.0504s] [ 86%]
2025-12-04T14:00:07.9954616Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bfloat16 PASSED [0.0656s] [ 86%]
2025-12-04T14:00:07.9954986Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bool PASSED [0.0503s] [ 86%]
2025-12-04T14:00:07.9955422Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex128 PASSED [0.0691s] [ 86%]
2025-12-04T14:00:07.9955818Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex64 PASSED [0.0696s] [ 86%]
2025-12-04T14:00:07.9956196Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float16 PASSED [0.0656s] [ 86%]
2025-12-04T14:00:07.9956580Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float32 PASSED [0.0656s] [ 87%]
2025-12-04T14:00:07.9956954Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float64 PASSED [0.0654s] [ 87%]
2025-12-04T14:00:07.9957322Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int16 PASSED [0.0503s] [ 87%]
2025-12-04T14:00:07.9957693Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int32 PASSED [0.0502s] [ 87%]
2025-12-04T14:00:07.9958064Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int64 PASSED [0.0506s] [ 87%]
2025-12-04T14:00:07.9958433Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int8 PASSED [0.0502s] [ 87%]
2025-12-04T14:00:07.9958832Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_uint8 PASSED [0.0501s] [ 87%]
2025-12-04T14:00:07.9959230Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bfloat16 PASSED [0.0483s] [ 87%]
2025-12-04T14:00:07.9959586Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bool PASSED [0.0414s] [ 87%]
2025-12-04T14:00:07.9959967Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex128 PASSED [0.0499s] [ 87%]
2025-12-04T14:00:07.9960347Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex64 PASSED [0.0503s] [ 87%]
2025-12-04T14:00:07.9960711Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float16 PASSED [0.0484s] [ 87%]
2025-12-04T14:00:07.9961073Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float32 PASSED [0.0482s] [ 87%]
2025-12-04T14:00:07.9961484Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float64 PASSED [0.0482s] [ 88%]
2025-12-04T14:00:07.9961839Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int16 PASSED [0.0413s] [ 88%]
2025-12-04T14:00:07.9962237Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int32 PASSED [0.0413s] [ 88%]
2025-12-04T14:00:07.9962595Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int64 PASSED [0.0417s] [ 88%]
2025-12-04T14:00:07.9962952Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int8 PASSED [0.0412s] [ 88%]
2025-12-04T14:00:07.9963319Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_uint8 PASSED [0.0412s] [ 88%]
2025-12-04T14:00:07.9963687Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bfloat16 PASSED [0.0479s] [ 88%]
2025-12-04T14:00:07.9964048Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bool PASSED [0.0413s] [ 88%]
2025-12-04T14:00:07.9964432Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex128 PASSED [0.0496s] [ 88%]
2025-12-04T14:00:07.9964806Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex64 PASSED [0.0506s] [ 88%]
2025-12-04T14:00:07.9965175Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float16 PASSED [0.0483s] [ 88%]
2025-12-04T14:00:07.9965588Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float32 PASSED [0.0482s] [ 88%]
2025-12-04T14:00:07.9965962Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float64 PASSED [0.0481s] [ 88%]
2025-12-04T14:00:07.9966360Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int16 PASSED [0.0413s] [ 88%]
2025-12-04T14:00:07.9966715Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int32 PASSED [0.0413s] [ 89%]
2025-12-04T14:00:07.9967084Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int64 PASSED [0.0416s] [ 89%]
2025-12-04T14:00:07.9967439Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int8 PASSED [0.0413s] [ 89%]
2025-12-04T14:00:07.9967802Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_uint8 PASSED [0.0412s] [ 89%]
2025-12-04T14:00:07.9968169Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bfloat16 PASSED [0.0464s] [ 89%]
2025-12-04T14:00:07.9968525Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bool PASSED [0.0396s] [ 89%]
2025-12-04T14:00:07.9968912Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex128 PASSED [0.0480s] [ 89%]
2025-12-04T14:00:07.9969286Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex64 PASSED [0.0485s] [ 89%]
2025-12-04T14:00:07.9969656Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float16 PASSED [0.0465s] [ 89%]
2025-12-04T14:00:07.9970022Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float32 PASSED [0.0464s] [ 89%]
2025-12-04T14:00:07.9970389Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float64 PASSED [0.0464s] [ 89%]
2025-12-04T14:00:07.9970750Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int16 PASSED [0.0396s] [ 89%]
2025-12-04T14:00:07.9971109Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int32 PASSED [0.0395s] [ 89%]
2025-12-04T14:00:07.9971472Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int64 PASSED [0.0399s] [ 89%]
2025-12-04T14:00:07.9971826Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int8 PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9972179Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_uint8 PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9972598Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bfloat16 PASSED [0.0464s] [ 90%]
2025-12-04T14:00:07.9972991Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bool PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9973370Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex128 PASSED [0.0481s] [ 90%]
2025-12-04T14:00:07.9973752Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex64 PASSED [0.0485s] [ 90%]
2025-12-04T14:00:07.9974115Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float16 PASSED [0.0465s] [ 90%]
2025-12-04T14:00:07.9974485Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float32 PASSED [0.0463s] [ 90%]
2025-12-04T14:00:07.9974847Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float64 PASSED [0.0462s] [ 90%]
2025-12-04T14:00:07.9975203Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int16 PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9975563Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int32 PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9975917Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int64 PASSED [0.0399s] [ 90%]
2025-12-04T14:00:07.9976273Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int8 PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9976670Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_uint8 PASSED [0.0395s] [ 90%]
2025-12-04T14:00:07.9977078Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bfloat16 PASSED [0.0936s] [ 91%]
2025-12-04T14:00:07.9977438Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bool PASSED [0.0788s] [ 91%]
2025-12-04T14:00:07.9977817Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex128 PASSED [0.0969s] [ 91%]
2025-12-04T14:00:07.9978197Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex64 PASSED [0.0973s] [ 91%]
2025-12-04T14:00:07.9978567Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float16 PASSED [0.0933s] [ 91%]
2025-12-04T14:00:07.9978980Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float32 PASSED [0.0933s] [ 91%]
2025-12-04T14:00:07.9979404Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float64 PASSED [0.0933s] [ 91%]
2025-12-04T14:00:07.9979762Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int16 PASSED [0.0788s] [ 91%]
2025-12-04T14:00:07.9980122Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int32 PASSED [0.0786s] [ 91%]
2025-12-04T14:00:07.9980479Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int64 PASSED [0.0791s] [ 91%]
2025-12-04T14:00:07.9980831Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int8 PASSED [0.0789s] [ 91%]
2025-12-04T14:00:07.9981193Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_uint8 PASSED [0.0786s] [ 91%]
2025-12-04T14:00:07.9981561Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bfloat16 PASSED [0.0935s] [ 91%]
2025-12-04T14:00:07.9981919Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bool PASSED [0.0787s] [ 91%]
2025-12-04T14:00:07.9982299Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex128 PASSED [0.0966s] [ 92%]
2025-12-04T14:00:07.9982675Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex64 PASSED [0.0970s] [ 92%]
2025-12-04T14:00:07.9983048Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float16 PASSED [0.0933s] [ 92%]
2025-12-04T14:00:07.9983460Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float32 PASSED [0.0932s] [ 92%]
2025-12-04T14:00:07.9983869Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float64 PASSED [0.0933s] [ 92%]
2025-12-04T14:00:07.9984229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int16 PASSED [0.0787s] [ 92%]
2025-12-04T14:00:07.9984585Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int32 PASSED [0.0786s] [ 92%]
2025-12-04T14:00:07.9984948Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int64 PASSED [0.0791s] [ 92%]
2025-12-04T14:00:07.9985304Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int8 PASSED [0.0786s] [ 92%]
2025-12-04T14:00:07.9985674Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_uint8 PASSED [0.0788s] [ 92%]
2025-12-04T14:00:07.9986043Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bfloat16 PASSED [0.0468s] [ 92%]
2025-12-04T14:00:07.9986400Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bool PASSED [0.0396s] [ 92%]
2025-12-04T14:00:07.9986785Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex128 PASSED [0.0478s] [ 92%]
2025-12-04T14:00:07.9987157Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex64 PASSED [0.0483s] [ 92%]
2025-12-04T14:00:07.9991112Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float16 PASSED [0.0463s] [ 93%]
2025-12-04T14:00:07.9991509Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float32 PASSED [0.0462s] [ 93%]
2025-12-04T14:00:07.9991945Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float64 PASSED [0.0462s] [ 93%]
2025-12-04T14:00:07.9992304Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int16 PASSED [0.0393s] [ 93%]
2025-12-04T14:00:07.9992660Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int32 PASSED [0.0393s] [ 93%]
2025-12-04T14:00:07.9993017Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int64 PASSED [0.0397s] [ 93%]
2025-12-04T14:00:07.9993374Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int8 PASSED [0.0394s] [ 93%]
2025-12-04T14:00:07.9993727Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_uint8 PASSED [0.0393s] [ 93%]
2025-12-04T14:00:07.9994098Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bfloat16 PASSED [0.0461s] [ 93%]
2025-12-04T14:00:07.9994452Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bool PASSED [0.0393s] [ 93%]
2025-12-04T14:00:07.9994829Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex128 PASSED [0.0477s] [ 93%]
2025-12-04T14:00:07.9995206Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex64 PASSED [0.0485s] [ 93%]
2025-12-04T14:00:07.9995569Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float16 PASSED [0.0463s] [ 93%]
2025-12-04T14:00:07.9995933Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float32 PASSED [0.0462s] [ 93%]
2025-12-04T14:00:07.9996294Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float64 PASSED [0.0463s] [ 94%]
2025-12-04T14:00:07.9996648Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int16 PASSED [0.0394s] [ 94%]
2025-12-04T14:00:07.9997007Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int32 PASSED [0.0394s] [ 94%]
2025-12-04T14:00:07.9997359Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int64 PASSED [0.0398s] [ 94%]
2025-12-04T14:00:07.9997711Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int8 PASSED [0.0393s] [ 94%]
2025-12-04T14:00:07.9998113Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_uint8 PASSED [0.0394s] [ 94%]
2025-12-04T14:00:07.9998550Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bfloat16 PASSED [0.0445s] [ 94%]
2025-12-04T14:00:07.9998933Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bool PASSED [0.0377s] [ 94%]
2025-12-04T14:00:07.9999313Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex128 PASSED [0.0461s] [ 94%]
2025-12-04T14:00:07.9999688Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex64 PASSED [0.0467s] [ 94%]
2025-12-04T14:00:08.0000052Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float16 PASSED [0.0446s] [ 94%]
2025-12-04T14:00:08.0000413Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float32 PASSED [0.0444s] [ 94%]
2025-12-04T14:00:08.0000781Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float64 PASSED [0.0446s] [ 94%]
2025-12-04T14:00:08.0001136Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int16 PASSED [0.0377s] [ 94%]
2025-12-04T14:00:08.0001494Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int32 PASSED [0.0376s] [ 95%]
2025-12-04T14:00:08.0001848Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int64 PASSED [0.0380s] [ 95%]
2025-12-04T14:00:08.0002242Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int8 PASSED [0.0376s] [ 95%]
2025-12-04T14:00:08.0002639Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_uint8 PASSED [0.0377s] [ 95%]
2025-12-04T14:00:08.0003005Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bfloat16 PASSED [0.0446s] [ 95%]
2025-12-04T14:00:08.0003365Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bool PASSED [0.0377s] [ 95%]
2025-12-04T14:00:08.0003743Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex128 PASSED [0.0460s] [ 95%]
2025-12-04T14:00:08.0004116Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex64 PASSED [0.0467s] [ 95%]
2025-12-04T14:00:08.0004480Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float16 PASSED [0.0446s] [ 95%]
2025-12-04T14:00:08.0004842Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float32 PASSED [0.0445s] [ 95%]
2025-12-04T14:00:08.0005208Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float64 PASSED [0.0446s] [ 95%]
2025-12-04T14:00:08.0005562Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int16 PASSED [0.0377s] [ 95%]
2025-12-04T14:00:08.0005915Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int32 PASSED [0.0376s] [ 95%]
2025-12-04T14:00:08.0006273Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int64 PASSED [0.0381s] [ 95%]
2025-12-04T14:00:08.0006625Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int8 PASSED [0.0376s] [ 96%]
2025-12-04T14:00:08.0006979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_uint8 PASSED [0.0377s] [ 96%]
2025-12-04T14:00:08.0007414Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSC_cuda_float64 SKIPPED [0.0014s] (Only runs on cpu) [ 96%]
2025-12-04T14:00:08.0008061Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSR_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%]
2025-12-04T14:00:08.0008513Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCOO_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%]
2025-12-04T14:00:08.0008972Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSC_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%]
2025-12-04T14:00:08.0009479Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSR_cuda_float64 SKIPPED [0.0018s] (Only runs on cpu) [ 96%]
2025-12-04T14:00:08.0009957Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_Strided_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%]
2025-12-04T14:00:08.0010399Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSC_cuda PASSED [0.0017s] [ 96%]
2025-12-04T14:00:08.0010844Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSR_cuda PASSED [0.0020s] [ 96%]
2025-12-04T14:00:08.0011281Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCOO_cuda PASSED [0.0017s] [ 96%]
2025-12-04T14:00:08.0011719Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSC_cuda PASSED [0.0016s] [ 96%]
2025-12-04T14:00:08.0012157Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSR_cuda PASSED [0.0017s] [ 96%]
2025-12-04T14:00:08.0012588Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_Strided_cuda PASSED [0.0019s] [ 96%]
2025-12-04T14:00:08.0013017Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSC_cuda PASSED [0.0019s] [ 97%]
2025-12-04T14:00:08.0013442Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSR_cuda PASSED [0.0017s] [ 97%]
2025-12-04T14:00:08.0013927Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCOO_cuda PASSED [0.0016s] [ 97%]
2025-12-04T14:00:08.0014404Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSC_cuda PASSED [0.0017s] [ 97%]
2025-12-04T14:00:08.0014828Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSR_cuda PASSED [0.0016s] [ 97%]
2025-12-04T14:00:08.0015246Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_Strided_cuda PASSED [0.0019s] [ 97%]
2025-12-04T14:00:08.0015682Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSC_cuda PASSED [0.0020s] [ 97%]
2025-12-04T14:00:08.0016117Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSR_cuda PASSED [0.0016s] [ 97%]
2025-12-04T14:00:08.0016550Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCOO_cuda PASSED [0.0017s] [ 97%]
2025-12-04T14:00:08.0016982Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSC_cuda PASSED [0.0017s] [ 97%]
2025-12-04T14:00:08.0017419Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSR_cuda PASSED [0.0016s] [ 97%]
2025-12-04T14:00:08.0017843Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_Strided_cuda PASSED [0.0014s] [ 97%]
2025-12-04T14:00:08.0018284Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSC_cuda PASSED [0.0023s] [ 97%]
2025-12-04T14:00:08.0018725Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSR_cuda PASSED [0.0016s] [ 97%]
2025-12-04T14:00:08.0019209Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCOO_cuda PASSED [0.0017s] [ 98%]
2025-12-04T14:00:08.0019652Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSC_cuda PASSED [0.0017s] [ 98%]
2025-12-04T14:00:08.0020090Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSR_cuda PASSED [0.0016s] [ 98%]
2025-12-04T14:00:08.0020520Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_Strided_cuda PASSED [0.0015s] [ 98%]
2025-12-04T14:00:08.0020989Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSC_cuda PASSED [0.0023s] [ 98%]
2025-12-04T14:00:08.0021451Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSR_cuda PASSED [0.0017s] [ 98%]
2025-12-04T14:00:08.0021874Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCOO_cuda PASSED [0.0016s] [ 98%]
2025-12-04T14:00:08.0022292Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSC_cuda PASSED [0.0017s] [ 98%]
2025-12-04T14:00:08.0022719Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSR_cuda PASSED [0.0016s] [ 98%]
2025-12-04T14:00:08.0023129Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_Strided_cuda PASSED [0.0014s] [ 98%]
2025-12-04T14:00:08.0023567Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSC_cuda PASSED [0.0023s] [ 98%]
2025-12-04T14:00:08.0024010Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSR_cuda PASSED [0.0016s] [ 98%]
2025-12-04T14:00:08.0024447Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCOO_cuda PASSED [0.0016s] [ 98%]
2025-12-04T14:00:08.0024885Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSC_cuda PASSED [0.0016s] [ 98%]
2025-12-04T14:00:08.0025407Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSR_cuda PASSED [0.0017s] [ 99%]
2025-12-04T14:00:08.0025838Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_Strided_cuda PASSED [0.0014s] [ 99%]
2025-12-04T14:00:08.0026314Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSC_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0026746Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSR_cuda PASSED [0.0026s] [ 99%]
2025-12-04T14:00:08.0027188Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCOO_cuda PASSED [0.0017s] [ 99%]
2025-12-04T14:00:08.0027622Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSC_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0028053Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSR_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0028506Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_Strided_cuda PASSED [0.0015s] [ 99%]
2025-12-04T14:00:08.0028947Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSC_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0029367Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSR_cuda PASSED [0.0020s] [ 99%]
2025-12-04T14:00:08.0029780Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCOO_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0030198Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSC_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0030614Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSR_cuda PASSED [0.0016s] [ 99%]
2025-12-04T14:00:08.0031016Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_Strided_cuda PASSED [0.0016s] [100%]
2025-12-04T14:00:08.0031022Z 
2025-12-04T14:00:08.0031525Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml -
2025-12-04T14:00:08.0031729Z ======== 1199 passed, 193 skipped, 1708 deselected in 651.49s (0:10:51) ========
2025-12-04T14:00:08.0032441Z The following tests failed consistently: ['test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64', 'test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64']
2025-12-04T14:00:08.0032500Z 
2025-12-04T14:00:08.0032832Z FINISHED PRINTING LOG FILE of test_sparse 1/1 (test/test-reports/test_sparse_1.1_e217f60a40d48402_.log)
2025-12-04T14:00:08.0032837Z 
2025-12-04T14:00:08.0033104Z Finished test_sparse 1/1 ... [2025-12-04 14:00:07.585002][17279.594224164], took 15.45min
2025-12-04T14:00:08.0033646Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml
2025-12-04T14:00:08.0034179Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml
2025-12-04T14:00:08.0034708Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml
2025-12-04T14:00:08.0035233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml
2025-12-04T14:00:08.0035784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml
2025-12-04T14:00:08.0036314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml
2025-12-04T14:00:08.0036841Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml
2025-12-04T14:00:08.4830711Z Uploading logs for 57118183212 to S3
2025-12-04T14:00:08.6549021Z Uploading artifacts took 0.72 seconds
2025-12-04T14:00:08.6549334Z test_sparse 1/1 failed!
2025-12-04T14:00:08.6552669Z Running test_ci_sanity_check_fail 1/1 ... [2025-12-04 14:00:08.654967][17280.664189554]
2025-12-04T14:00:08.6553140Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:00:08.6557183Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ci_sanity_check_fail.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:00:08.655374]
2025-12-04T14:00:23.4921403Z Finished test_ci_sanity_check_fail 1/1 ... [2025-12-04 14:00:23.491617][17295.500837836], took 0.25min
2025-12-04T14:00:23.5094510Z Running test_ops_fwd_gradients 6/12 ... [2025-12-04 14:00:23.508944][17295.518170471]
2025-12-04T14:00:23.5095143Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:00:23.5096307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_fwd_gradients.py', '--shard-id=6', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:00:23.509263]
2025-12-04T14:12:56.1077563Z 
2025-12-04T14:12:56.1078719Z test_ops_fwd_gradients 6/12 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_fwd_gradients_6.12_abead446b517b77f_.log
2025-12-04T14:12:56.1254378Z Running 276 items in this shard: test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cartesian_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chalf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_floor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isreal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softsign_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scalar_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_searchsorted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_mm_reduce_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__chunk_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_constant_pad_nd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_geqrf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorsolve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_inf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_nuc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_count_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_circular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pinverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unflatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vsplit_cuda_float64
2025-12-04T14:12:56.1427600Z 
2025-12-04T14:12:56.1427912Z Finished test_ops_fwd_gradients 6/12 ... [2025-12-04 14:12:56.107871][18048.117094123], took 12.54min
2025-12-04T14:12:56.1428976Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-95ccd07868721469.xml
2025-12-04T14:12:56.2244715Z Running test_ops_gradients 2/10 ... [2025-12-04 14:12:56.224071][18048.233293292]
2025-12-04T14:12:56.2245355Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:12:56.2248317Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=2', '--num-shards=10', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:12:56.224436]
2025-12-04T14:25:25.8651557Z 
2025-12-04T14:25:25.8653247Z test_ops_gradients 2/10 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_2.10_8b90327e47e16b38_.log
2025-12-04T14:25:25.8873408Z Running 520 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyTakeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__softmax_backward_data_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cummax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_xor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_searchsorted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyCubeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_no_rounding_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vecdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_multinomial_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_kl_div_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_circular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_circular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_hann_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diag_embed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_inner_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_long_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softsign_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_split_with_sizes_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unbind_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_householder_product_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pinverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rot90_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ihfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logaddexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_or_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_erfcx_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vsplit_cuda_complex128
2025-12-04T14:25:25.9085989Z 
2025-12-04T14:25:25.9086281Z Finished test_ops_gradients 2/10 ... [2025-12-04 14:25:25.866059][18797.875283273], took 12.49min
2025-12-04T14:25:25.9087358Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-1e96fc6cc9093b07.xml
2025-12-04T14:25:26.5873869Z Uploading artifacts took 0.62 seconds
2025-12-04T14:25:26.5877040Z Running test_ops_gradients 10/10 ... [2025-12-04 14:25:26.587365][18798.596588143]
2025-12-04T14:25:26.5877648Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:25:26.5881704Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=10', '--num-shards=10', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:25:26.587769]
2025-12-04T14:39:29.0135786Z 
2025-12-04T14:39:29.0137372Z test_ops_gradients 10/10 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_10.10_690d4f6748dd1bf7_.log
2025-12-04T14:39:29.0384416Z Running 574 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dist_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eye_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vander_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logaddexp2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logaddexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_prelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_zeta_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___radd___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cov_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_xor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triangular_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tril_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_maximum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_in_place_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scan_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_partial_views_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cartesian_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vander_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_max_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_mse_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_relu6_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_or_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_dropout3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_complex128
2025-12-04T14:39:29.0620055Z 
2025-12-04T14:39:29.0620350Z Finished test_ops_gradients 10/10 ... [2025-12-04 14:39:29.015053][19641.024277061], took 14.04min
2025-12-04T14:39:29.0621431Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-91f289dc18834c3e.xml
2025-12-04T14:39:29.1227230Z Running functorch/test_ops 3/6 ... [2025-12-04 14:39:29.122334][19641.13155584]
2025-12-04T14:39:29.1227771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:39:29.1230253Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=3', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:39:29.122661]
2025-12-04T14:52:58.8503328Z 
2025-12-04T14:52:58.8504350Z functorch/test_ops 3/6 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_3.6_4e22832cb04fe87a_.log
2025-12-04T14:52:58.9188635Z Running 1655 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_data_write_errors_under_transform_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_unbind_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mT_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_permute_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpySortAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acos_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_combinations_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_det_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_grad_oriented_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_huber_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_local_response_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rrelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_smooth_l1_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_tanhshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_neg_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sgn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_u_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_chunk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32
2025-12-04T14:52:58.9850113Z 
2025-12-04T14:52:58.9850410Z Finished functorch/test_ops 3/6 ... [2025-12-04 14:52:58.852897][20450.862119719], took 13.50min
2025-12-04T14:52:58.9851437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-05b5b699aba88456.xml
2025-12-04T14:52:59.5613333Z Uploading artifacts took 0.58 seconds
2025-12-04T14:52:59.5617211Z Running dynamo/test_after_aot 1/1 ... [2025-12-04 14:52:59.561438][20451.570660896]
2025-12-04T14:52:59.5617676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:52:59.5621977Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_after_aot.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:52:59.561849]
2025-12-04T14:53:08.3936257Z 
2025-12-04T14:53:08.3937128Z dynamo/test_after_aot 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_after_aot_1.1_e8843ead62c525f1_.log
2025-12-04T14:53:08.3938571Z Running 2 items in this shard: test/dynamo/test_after_aot.py::TestAfterAot::test_dump_tensor, test/dynamo/test_after_aot.py::TestAfterAot::test_save_graph_repro
2025-12-04T14:53:08.3939462Z 
2025-12-04T14:53:08.3939744Z Finished dynamo/test_after_aot 1/1 ... [2025-12-04 14:53:08.393221][20460.402445603], took 0.15min
2025-12-04T14:53:08.4115436Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_after_aot/dynamo.test_after_aot-138e4478191117d7.xml
2025-12-04T14:53:08.4879423Z Running inductor/test_snode_runtime 1/1 ... [2025-12-04 14:53:08.487554][20460.496776868]
2025-12-04T14:53:08.4879993Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:53:08.4882416Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_snode_runtime.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:53:08.487869]
2025-12-04T14:53:24.5332558Z 
2025-12-04T14:53:24.5334026Z inductor/test_snode_runtime 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_snode_runtime_1.1_f8102af9af532885_.log
2025-12-04T14:53:24.5347739Z Running 22 items in this shard: test/inductor/test_snode_runtime.py::UnsupportedTests::test_no_cuda, test/inductor/test_snode_runtime.py::UnsupportedTests::test_no_op, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_addmm, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_bmm, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv1d, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv2d, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv2d_transpose, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv3d, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_mm, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_dynamic, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_horizontal_reduction_pointwise, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_pointwise, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_relu, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_gather_into_tensor, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_gather_into_tensor_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_reduce, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_reduce_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_legacy_all_gather_into_tensor_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_legacy_all_reduce, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_legacy_all_reduce_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_reduce_scatter_tensor, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_reduce_scatter_tensor_coalesced
2025-12-04T14:53:24.5358709Z 
2025-12-04T14:53:24.5359185Z Finished inductor/test_snode_runtime 1/1 ... [2025-12-04 14:53:24.532906][20476.542130993], took 0.27min
2025-12-04T14:53:24.5523692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_snode_runtime/inductor.test_snode_runtime-f1ec066e866be26d.xml
2025-12-04T14:53:24.6188163Z Running inductor/test_compiled_autograd 1/2 ... [2025-12-04 14:53:24.618376][20476.627598109]
2025-12-04T14:53:24.6188994Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:53:24.6190920Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:53:24.618718]
2025-12-04T15:01:30.9757799Z 
2025-12-04T15:01:30.9758863Z inductor/test_compiled_autograd 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_1.2_d8737cb5eeb8c364_.log
2025-12-04T15:01:30.9995981Z Running 438 items in this shard: test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_5_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_already_nan, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_backward, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_grad, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_basic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_id_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_non_traceable, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_dynamic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_backward_hook_relative_ordering_partial, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cache_hit, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_sac, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compiled_autograd_does_not_specialize_on_bw_symints, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cpu_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_graph, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_cpp_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_python_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_sdpa, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_compiled_fw_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_dynamically_defined_class, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_multiple_grads, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_attr, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_multiple_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_cpp_reducer_error, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_python_reducer, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_disk_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_annotations, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_eager_node, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamo_boxed, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_flex_attention, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory_subclass, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_higher_order_gradients, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_cpp_extension, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_torch, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inplace_grad_update, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inputs_aliasing_bytecode_stack_restore, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_issue106555, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_usage_after_compiled, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logging_tensor_flaky, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_tensor_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reset, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_saved_tensor_unpack_hook_ordering, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_only_backward_call, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_function_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_run_with_rng_state, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes_hop, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_cpp, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_snapshot, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_should_not_execute, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_with_zero_numel_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_assign_parent_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_detect_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_mode_no_check_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_view_of_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_creation_meta, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_cross_dtype, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_multiple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_simple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_views_codegen, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_copy, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_create_graph_warns, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_hook_relative_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_retained_graph_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_with_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_calculate_shape_util, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_cant_create_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_detects_non_determinism, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_graph_execution_group, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_valid_reset_on_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_correct_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_custom_function_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_dataparallel, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_memory_savings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_create_graph_and_full_backward_hook_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_execution_order, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_ac_early_stop, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_no_early_free, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_repeated_grad_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_wrong_formula, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_mark_dirty_not_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saving_mutated_view_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_simple, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_vmap_defaults, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_deep_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dep_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dependent_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_base, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_then_inplace_raises_in_autograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks_nested, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_duplicate_backward_root, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_enable_grad_decorator_no_paren, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_first_grad_fn_access_in_no_grad_mode, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_complicated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_pyfunction, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_get_data_and_hooks_from_raw_saved_variable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_empty_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_input_metadata, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf_register_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_thread_safety, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_materialize, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_unreachable_discovery, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_forward_or_backward_only, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_complex_non_complex_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_custom_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_dense_and_sparse_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout2, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout4, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_test_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_validates_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_graph_save_on_cpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_edge_case_when_called_with_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_none, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks_cpp, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_indexing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_not_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_leaf_errors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_integer_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_legacy_function_deprecation_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_lobpcg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_mark_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_named_tensor_for_complex_views, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_anomaly_access, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_autograd_function_stashing_ctx, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_nested_anomaly_printstack_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_next_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_python_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_save, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_not_implemented_fwad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pickle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_returns_not_None, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pow_zero_tensor_gradient, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_power_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_prehook_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_table, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_function_event_avg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_seq_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_shapes, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_child_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_leaf_variable_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad_, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retains_grad_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_none_for_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_on_cpu_and_checkpoint, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_output_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_scalar_grad_mixed_device, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_select_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_tensorimpl_type, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_benign_exceptions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_enabled_wraps, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_generator_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_materialize_non_diff_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_shape, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sharded_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_both_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim_neg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_ind_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_grad_warnings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_thread_shutdown, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_too_many_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unrelated_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unused_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_var_mean_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_view_func_replay_with_modified_state, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_volatile_deprecated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_will_engine_execute_node, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_kwargs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_same_graph_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_True, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_grad_for_nontensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_tensorlist, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_basic_make_fx, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_basic, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_nms_dynamic_compile, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_defined_in_python, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_duplicate_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_abstract_overload, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_cpu, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_invalid_devices, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_multiple, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_separate, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_supported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_unsupported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_qualname, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_schemas, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_functional_schema, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_tensorlist_like_type, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_define, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_meta_for_data_dependent_shape_operation, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_name_must_match, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_new_data_dependent_symint, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_private_ctor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_param_types, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_symints, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_unsupported_schemas, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_allow_python_side_effects_utility, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_constants, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_input_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_numpy_number, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_tracked, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_untracked_global_nested, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_branches_no_arguments, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_free_variable_in_both_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_graph_break_in_one_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_pytree_operands, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_side_effect_in_one_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_with_constant_pred, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fallback_on_graph_break_simple, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_freevars_as_inputs_to_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_grad_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_no_hints, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hopify_generic_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_internal_nonlocal, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensors_with_compound_expressions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_lowers_to_graph, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_multi_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_pytree_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_tuple_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_no_freevars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_output_with_dict, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_register_subclass, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var_used_multiple_times, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_vars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_local_list_append_no_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_list, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_tensor_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nested_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_symint_in_slice, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_multiply_scalar, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_allow_local_assign_in_body_fn, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_inductor_compiled_regions_option, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_default_else_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_only, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_recompile, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_source_fn_stack, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call_sequential_params_and_buffers, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_fn_with_kwargs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_python_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_pytree, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_side_effect, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_simple, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_teardown_resets_nested_graph_breaks, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs_python_struct, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_free_const, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_out_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_outputs_diff_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_over_vmap_captured, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_pytree_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_different_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_same_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects_append_input, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs_tuple_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_conditional_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_invalid_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_dropout_inductor, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond_unbalanced_branches, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_module, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_non_aliasing_util, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_device_mesh_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_basic_export, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_dynamo_disable, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_graph_break, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_different_gradient_placement, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dont_recompile_on_same_placement_devicemesh, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_loss_parallel_log_softmax, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_slice, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamo_device_mesh_attrs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_graph_output, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_redistribute_unbalanced_correct_strides, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_requires_grad_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate_as_args, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs_forward_hook, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_fakify_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_graph_input_is_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_placement_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_unwrap_async_collective_tensor_tangent, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_cond_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_quant_packed_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_subgraph_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_nested_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_while_loop_simple_cuda_float32
2025-12-04T15:01:31.0220470Z 
2025-12-04T15:01:31.0220818Z Finished inductor/test_compiled_autograd 1/2 ... [2025-12-04 15:01:30.976888][20962.986111824], took 8.11min
2025-12-04T15:01:31.0222028Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-bf57fb8d20e32a72.xml
2025-12-04T15:01:31.0963228Z Running test_testing 1/1 ... [2025-12-04 15:01:31.095949][20963.105171419]
2025-12-04T15:01:31.0963771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:01:31.0966832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_testing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:01:31.096291]
2025-12-04T15:02:21.1544740Z 
2025-12-04T15:02:21.1545572Z test_testing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_testing_1.1_6250d60ab394f89f_.log
2025-12-04T15:02:21.2499977Z Running 2074 items in this shard: test/test_testing.py::TestTestingCUDA::test_assertEqual_longMessage_cuda, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_bool, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int8, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_utils_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_get_supported_dtypes_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_bool, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_bool_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_equality_shortcut_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float64, test/test_testing.py::TestTestingCUDA::test_setup_and_teardown_run_for_device_specific_tests_cuda, test/test_testing.py::TestTestingCUDA::test_supported_dtypes_abs_cuda, test/test_testing.py::TestFrameworkUtils::test_filtering_env_var, test/test_testing.py::TestAssertClose::test_bool, test/test_testing.py::TestAssertClose::test_default_tolerance_selection_mismatching_dtypes, test/test_testing.py::TestAssertClose::test_docstring_examples, test/test_testing.py::TestAssertClose::test_matching, test/test_testing.py::TestAssertClose::test_matching_atol, test/test_testing.py::TestAssertClose::test_matching_conjugate_bit, test/test_testing.py::TestAssertClose::test_matching_nan, test/test_testing.py::TestAssertClose::test_matching_nan_with_equal_nan, test/test_testing.py::TestAssertClose::test_matching_rtol, test/test_testing.py::TestAssertClose::test_meta, test/test_testing.py::TestAssertClose::test_mismatching_dtype, test/test_testing.py::TestAssertClose::test_mismatching_dtype_no_check, test/test_testing.py::TestAssertClose::test_mismatching_layout, test/test_testing.py::TestAssertClose::test_mismatching_layout_no_check, test/test_testing.py::TestAssertClose::test_mismatching_shape, test/test_testing.py::TestAssertClose::test_mismatching_stride, test/test_testing.py::TestAssertClose::test_mismatching_stride_no_check, test/test_testing.py::TestAssertClose::test_mismatching_types, test/test_testing.py::TestAssertClose::test_mismatching_types_subclasses, test/test_testing.py::TestAssertClose::test_mismatching_types_type_equality, test/test_testing.py::TestAssertClose::test_mismatching_values, test/test_testing.py::TestAssertClose::test_mismatching_values_atol, test/test_testing.py::TestAssertClose::test_mismatching_values_rtol, test/test_testing.py::TestAssertClose::test_none, test/test_testing.py::TestAssertClose::test_none_mismatch, test/test_testing.py::TestAssertClose::test_numpy, test/test_testing.py::TestAssertClose::test_only_atol, test/test_testing.py::TestAssertClose::test_only_rtol, test/test_testing.py::TestAssertClose::test_scalar, test/test_testing.py::TestAssertClose::test_unexpected_error_compare, test/test_testing.py::TestAssertClose::test_unexpected_error_originate, test/test_testing.py::TestAssertClose::test_unknown_layout, test/test_testing.py::TestAssertClose::test_unknown_type, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_cuda, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_no_check_cuda, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_atol, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_scalars, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_tensor_likes, test/test_testing.py::TestAssertCloseErrorMessage::test_mismatched_elements, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_callable, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_str, test/test_testing.py::TestAssertCloseErrorMessage::test_not_close, test/test_testing.py::TestAssertCloseErrorMessage::test_not_equal, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_rtol, test/test_testing.py::TestAssertCloseErrorMessage::test_small_float_dtype, test/test_testing.py::TestAssertCloseErrorMessage::test_zero_div_zero, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_keys, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_values_msg, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_len, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_coalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_uncoalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_indices_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_nnz, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_sparse_dims, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_matching, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_matching, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_matching, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_matching, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_channel, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_tensor, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_is_quantized, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_qscheme, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_uint8, test/test_testing.py::TestTestParametrization::test_apply_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_compose_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_default_names, test/test_testing.py::TestTestParametrization::test_modules_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_multiple_handling_of_same_param_error, test/test_testing.py::TestTestParametrization::test_name_fn, test/test_testing.py::TestTestParametrization::test_ops_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_reparametrize, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_1, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_2, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_3, test/test_testing.py::TestTestParametrization::test_subtest_names, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_6, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_name_non_primitive_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_invalid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_valid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_list_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_decorator_applies_module_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_multiple_handling_of_same_param_error_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_name_fn_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_decorator_applies_op_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_param_specific_decoration_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_1_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_2_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_3_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_unparametrized_names_cuda, test/test_testing.py::TestImports::test_circular_dependencies, test/test_testing.py::TestImports::test_lazy_imports_are_lazy, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_functorch, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_torch, test/test_testing.py::TestImports::test_no_warning_on_import, test/test_testing.py::TestImports::test_not_import_sympy, test/test_testing.py::TestOpInfos::test_sample_input, test/test_testing.py::TestOpInfos::test_sample_input_metadata, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_T_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___radd___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rand___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rdiv___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmod___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmul___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___ror___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rpow___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rsub___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rxor___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators__chunk_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_aminmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_arange_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_as_strided_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_atan2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bernoulli_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_left_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_right_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bucketize_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cauchy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_max_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_min_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_complex_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_copysign_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cov_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_embed_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diff_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_floor_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_no_rounding_mode_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_trunc_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_empty_permuted_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eye_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fliplr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_flipud_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_float_power_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_floor_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmod_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gather_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gcd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ge_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_geometric_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gradient_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_heaviside_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_histogramdd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hypot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igamma_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igammac_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_isclose_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_item_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_kthvalue_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lcm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ldexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_le_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_cross_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_log_normal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logaddexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logcumsumexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_fill_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_max_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_maximum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mean_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_median_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_min_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_minimum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_movedim_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mul_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_multinomial_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_native_layer_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ne_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_neg_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nextafter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_embedding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_group_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hardtanh_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_huber_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_l1_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multi_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multilabel_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_prelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rms_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rrelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_softshrink_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_normal_in_place_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ormqr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_polar_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_pow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_remainder_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_renorm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_roll_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rot90_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rsub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_bartlett_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_blackman_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_gaussian_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hann_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_kaiser_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_nuttall_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_h_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_he_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_laguerre_polynomial_l_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_legendre_polynomial_p_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_xlog1py_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_zeta_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sum_to_size_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_take_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_trace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_tril_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_triu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_true_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_uniform_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vdot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_where_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_xlogy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_H_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_T_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___getitem___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmatmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__batch_norm_with_update_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__chunk_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_lengths_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_offsets_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__softmax_backward_data_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_put_accumulate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__upsample_bilinear2d_aa_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_decomposed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_alias_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_all_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_allclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_aminmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_any_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_arange_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argsort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argwhere_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_partial_views_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_baddbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bernoulli_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bincount_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_block_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_shapes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cartesian_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cauchy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_inverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_column_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_combinations_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_constant_pad_nd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_corrcoef_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_count_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cov_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagflat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diff_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_einsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_permuted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_equal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eye_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flip_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fliplr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flipud_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gather_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geometric_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geqrf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gradient_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hash_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_histc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_inner_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_istft_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_item_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kron_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kthvalue_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lerp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cond_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_det_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eig_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_householder_product_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_multi_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_slogdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svdvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vander_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vecdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vector_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logcumsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_unpack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mH_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mT_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matrix_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_msort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_multinomial_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmedian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanquantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nansum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_dropout_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_channel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_glu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_head_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_negative_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rms_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_static_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_fro_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_inf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_nuc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_in_place_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_number_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ormqr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_outer_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pca_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pinverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_quantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rand_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ravel_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_renorm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_interleave_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize_as__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_roll_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rot90_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scalar_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_searchsorted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_mm_reduce_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_list_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_multiple_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_to_size_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_along_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensor_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensordot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_sparse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_topk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapz_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triangular_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unflatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_uniform_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_consecutive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unravel_index_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_real_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zero__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_like_cuda_float32
2025-12-04T15:02:21.3420613Z 
2025-12-04T15:02:21.3420878Z Finished test_testing 1/1 ... [2025-12-04 15:02:21.158185][21013.167409357], took 0.83min
2025-12-04T15:02:21.3421815Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_testing/test_testing-4c4caba52af0adff.xml
2025-12-04T15:02:21.3422759Z Running inductor/test_autoheuristic 1/1 ... [2025-12-04 15:02:21.298246][21013.307466111]
2025-12-04T15:02:21.3423283Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:02:21.3424363Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_autoheuristic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:21.298584]
2025-12-04T15:02:27.5759152Z 
2025-12-04T15:02:27.5760820Z inductor/test_autoheuristic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_autoheuristic_1.1_6939193d627efb00_.log
2025-12-04T15:02:27.5761675Z Running 0 items in this shard:
2025-12-04T15:02:27.5761866Z 
2025-12-04T15:02:27.5762183Z Finished inductor/test_autoheuristic 1/1 ... [2025-12-04 15:02:27.575553][21019.584778307], took 0.10min
2025-12-04T15:02:27.5944808Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_autoheuristic/inductor.test_autoheuristic-10f7d7896ce04bc8.xml
2025-12-04T15:02:27.6554136Z Running inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:02:27.655052][21019.664274421]
2025-12-04T15:02:27.6554660Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:02:27.6557485Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:27.655390]
2025-12-04T15:02:33.9325088Z 
2025-12-04T15:02:33.9326058Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_c65b62856ae46e85_.log
2025-12-04T15:02:33.9332118Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks
2025-12-04T15:02:33.9337609Z 
2025-12-04T15:02:33.9337940Z Finished inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:02:33.932186][21025.941410258], took 0.10min
2025-12-04T15:02:33.9515575Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-c4d4e9aba2280ad9.xml
2025-12-04T15:02:34.0357579Z Running inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:02:34.035375][21026.044596486]
2025-12-04T15:02:34.0358104Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:02:34.0360690Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:34.035692]
2025-12-04T15:03:05.2106365Z 
2025-12-04T15:03:05.2107524Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_f16e3698532d27f8_.log
2025-12-04T15:03:05.2116337Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu
2025-12-04T15:03:05.2123397Z 
2025-12-04T15:03:05.2123729Z Finished inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:03:05.210321][21057.219544998], took 0.52min
2025-12-04T15:03:05.2306053Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-8a04be886b6d69cf.xml
2025-12-04T15:03:05.3130755Z Running inductor/test_remote_cache 1/1 ... [2025-12-04 15:03:05.312562][21057.321785061]
2025-12-04T15:03:05.3131348Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:05.3133736Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_remote_cache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:05.312907]
2025-12-04T15:03:08.9345850Z 
2025-12-04T15:03:08.9346573Z inductor/test_remote_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_remote_cache_1.1_e90358269eb2823f_.log
2025-12-04T15:03:08.9348325Z Running 3 items in this shard: test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_logging, test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_no_sample, test/inductor/test_remote_cache.py::TestRemoteCache::test_normal_logging
2025-12-04T15:03:08.9349448Z 
2025-12-04T15:03:08.9349760Z Finished inductor/test_remote_cache 1/1 ... [2025-12-04 15:03:08.934250][21060.943470308], took 0.06min
2025-12-04T15:03:08.9551000Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-c7e05865cddca77f.xml
2025-12-04T15:03:08.9948411Z Running inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:03:08.994501][21061.003726562]
2025-12-04T15:03:08.9948959Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:08.9952368Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_coordinate_descent_tuner.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:08.994851]
2025-12-04T15:03:19.4809206Z 
2025-12-04T15:03:19.4810449Z inductor/test_coordinate_descent_tuner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_coordinate_descent_tuner_1.1_2fd6afd7cb5bda25_.log
2025-12-04T15:03:19.4813551Z Running 5 items in this shard: test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_abs_function, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_get_neighbour_values, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_no_neighbors, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_persistent_reduction, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_value_too_large
2025-12-04T15:03:19.4816315Z 
2025-12-04T15:03:19.4816773Z Finished inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:03:19.480500][21071.48972451], took 0.17min
2025-12-04T15:03:19.5000464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6d20a7277844030b.xml
2025-12-04T15:03:19.5735297Z Running inductor/test_inplace_padding 1/1 ... [2025-12-04 15:03:19.573146][21071.582368577]
2025-12-04T15:03:19.5736002Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:19.5738117Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inplace_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:19.573460]
2025-12-04T15:03:38.0253392Z 
2025-12-04T15:03:38.0254466Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_25c4b19bcfb0badf_.log
2025-12-04T15:03:38.0258896Z Running 9 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input
2025-12-04T15:03:38.0262669Z 
2025-12-04T15:03:38.0263003Z Finished inductor/test_inplace_padding 1/1 ... [2025-12-04 15:03:38.024837][21090.034058883], took 0.31min
2025-12-04T15:03:38.0448531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-6a2d2929a87aa7f5.xml
2025-12-04T15:03:38.1321153Z Running inductor/test_cudacodecache 1/1 ... [2025-12-04 15:03:38.131755][21090.140978019]
2025-12-04T15:03:38.1321684Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:38.1324956Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudacodecache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:38.132074]
2025-12-04T15:03:45.8616030Z 
2025-12-04T15:03:45.8616926Z inductor/test_cudacodecache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudacodecache_1.1_20e9a908d42a6261_.log
2025-12-04T15:03:45.8618757Z Running 3 items in this shard: test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_async_compile, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_compilation_error, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_cuda_load
2025-12-04T15:03:45.8620001Z 
2025-12-04T15:03:45.8620327Z Finished inductor/test_cudacodecache 1/1 ... [2025-12-04 15:03:45.861267][21097.870491369], took 0.13min
2025-12-04T15:03:45.8813238Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-b498ae4cc20525c9.xml
2025-12-04T15:03:45.9475686Z Running inductor/test_minifier_utils 1/1 ... [2025-12-04 15:03:45.947193][21097.956416342]
2025-12-04T15:03:45.9476202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:45.9478884Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:45.947500]
2025-12-04T15:03:50.5206994Z 
2025-12-04T15:03:50.5208166Z inductor/test_minifier_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_utils_1.1_82d82b53a102b66f_.log
2025-12-04T15:03:50.5210036Z Running 3 items in this shard: test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_convert_module_to_string, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_invalid_output, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_non_exportable
2025-12-04T15:03:50.5211293Z 
2025-12-04T15:03:50.5211615Z Finished inductor/test_minifier_utils 1/1 ... [2025-12-04 15:03:50.520290][21102.529514054], took 0.08min
2025-12-04T15:03:50.5404240Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-4c5fe50d62df582d.xml
2025-12-04T15:03:50.5707921Z Running inductor/test_debug_trace 1/1 ... [2025-12-04 15:03:50.570417][21102.579642159]
2025-12-04T15:03:50.5708471Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:50.5711097Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:50.570715]
2025-12-04T15:04:06.2173852Z 
2025-12-04T15:04:06.2175012Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_cc4f32af9453e690_.log
2025-12-04T15:04:06.2176736Z Running 3 items in this shard: test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_multi_tempalte, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_printer_const, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_trace
2025-12-04T15:04:06.2177859Z 
2025-12-04T15:04:06.2178168Z Finished inductor/test_debug_trace 1/1 ... [2025-12-04 15:04:06.217037][21118.226261963], took 0.26min
2025-12-04T15:04:06.2372676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-179ecdae5d21ef0e.xml
2025-12-04T15:04:06.3228854Z Running export/test_tree_utils 1/1 ... [2025-12-04 15:04:06.322533][21118.331756169]
2025-12-04T15:04:06.3229367Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:04:06.3232225Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tree_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:06.322847]
2025-12-04T15:04:09.9443947Z 
2025-12-04T15:04:09.9445207Z export/test_tree_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tree_utils_1.1_0e627f819fabbb55_.log
2025-12-04T15:04:09.9446554Z Running 2 items in this shard: test/export/test_tree_utils.py::TestTreeUtils::test_equivalence_check, test/export/test_tree_utils.py::TestTreeUtils::test_reorder_kwargs
2025-12-04T15:04:09.9447288Z 
2025-12-04T15:04:09.9447595Z Finished export/test_tree_utils 1/1 ... [2025-12-04 15:04:09.943974][21121.953198926], took 0.06min
2025-12-04T15:04:09.9641951Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-bacbff1a865ff8bb.xml
2025-12-04T15:04:10.0007222Z Running inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:04:10.000383][21122.009607265]
2025-12-04T15:04:10.0008105Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:04:10.0010189Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:10.000680]
2025-12-04T15:04:26.5969452Z 
2025-12-04T15:04:26.5970810Z inductor/test_triton_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_wrapper_1.1_25aa967110a2fbe1_.log
2025-12-04T15:04:26.5971978Z Running 1 items in this shard: test/inductor/test_triton_wrapper.py::TestTritonWrapper::test_wrapper_using_gpu_seed
2025-12-04T15:04:26.5972502Z 
2025-12-04T15:04:26.5972832Z Finished inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:04:26.596486][21138.605710798], took 0.28min
2025-12-04T15:04:26.6177111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-e71c26709471ff2e.xml
2025-12-04T15:04:26.6906923Z Running inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:04:26.690292][21138.699514947]
2025-12-04T15:04:26.6907585Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:04:26.6909648Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_static_cuda_launcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:26.690607]
2025-12-04T15:04:41.1340659Z 
2025-12-04T15:04:41.1341901Z inductor/test_static_cuda_launcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_static_cuda_launcher_1.1_0c71a221d8835012_.log
2025-12-04T15:04:41.1349813Z Running 17 items in this shard: test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic_1arg, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_constexpr, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_implied_constant, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_many_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_no_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_signed_integers, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_too_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_unsigned_integers, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_any, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_basic_compile, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_disable_static_cuda_launcher, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_incompatible_code, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_static_launch_user_defined_triton_kernels
2025-12-04T15:04:41.1357288Z 
2025-12-04T15:04:41.1357631Z Finished inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:04:41.133598][21153.14281808], took 0.24min
2025-12-04T15:04:41.1548445Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-45ff8ae422230f99.xml
2025-12-04T15:04:41.2384129Z Running inductor/test_provenance_tracing 1/1 ... [2025-12-04 15:04:41.237889][21153.247111481]
2025-12-04T15:04:41.2384862Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:04:41.2386837Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_provenance_tracing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:41.238232]
2025-12-04T15:05:50.5854251Z 
2025-12-04T15:05:50.5855996Z inductor/test_provenance_tracing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_provenance_tracing_1.1_80110daa3530439c_.log
2025-12-04T15:05:50.5869725Z Running 16 items in this shard: test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_combo_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_cuda, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_extern_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingNodeMapping::test_create_node_mapping, test/inductor/test_provenance_tracing.py::TestProvenanceTracingNodeMeta::test_pattern_matcher_transfer_meta, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_cpu_extern_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_create_kernel_information_json_function, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_deferred_triton_kernels, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_kernel_information_generation, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_no_kernel_information_without_provenance_tracking, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_tlparse_kernel_stack_traces, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextCpu::test_aoti_python_stack_traces_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextCpu::test_jit_inductor_with_flag_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextGpu::test_aoti_python_stack_traces_cuda, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextGpu::test_jit_inductor_with_flag_cuda
2025-12-04T15:05:50.5881838Z 
2025-12-04T15:05:50.5882339Z Finished inductor/test_provenance_tracing 1/1 ... [2025-12-04 15:05:50.585034][21222.594257927], took 1.16min
2025-12-04T15:05:50.6071674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_provenance_tracing/inductor.test_provenance_tracing-6455ccf06df051be.xml
2025-12-04T15:05:50.6869353Z Running inductor/test_memory_planning 1/1 ... [2025-12-04 15:05:50.686485][21222.695707979]
2025-12-04T15:05:50.6870202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:05:50.6871699Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_memory_planning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:50.686803]
2025-12-04T15:06:08.4856308Z 
2025-12-04T15:06:08.4857478Z inductor/test_memory_planning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_memory_planning_1.1_fa1d6b036138d22f_.log
2025-12-04T15:06:08.4859688Z Running 4 items in this shard: test/inductor/test_memory_planning.py::TestMemoryPlanning::test_aoti, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_cpp_wrapper, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_python_wrapper, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_unbacked_symint
2025-12-04T15:06:08.4861212Z 
2025-12-04T15:06:08.4861564Z Finished inductor/test_memory_planning 1/1 ... [2025-12-04 15:06:08.485153][21240.494377644], took 0.30min
2025-12-04T15:06:08.5061752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory_planning/inductor.test_memory_planning-d9b25b367275156e.xml
2025-12-04T15:06:08.6044730Z Running export/test_cpp_serdes 1/1 ... [2025-12-04 15:06:08.604080][21240.613302666]
2025-12-04T15:06:08.6045227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:06:08.6047815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_cpp_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:06:08.604399]
2025-12-04T15:07:32.8346367Z 
2025-12-04T15:07:32.8347463Z export/test_cpp_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_cpp_serdes_1.1_75563679f31ba4f4_.log
2025-12-04T15:07:32.8532246Z Running 431 items in this shard: test/export/test_cpp_serdes.py::CppSerdesTestExport::test__scaled_dot_product_flash_attention_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_additional_inputs_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_allow_explicit_guards_as_runtime_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_annotate_on_assert_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_args_type_checked_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_aten_lift_fresh_copy_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_attention_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_attr_assignment_extra_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_constrain_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_constant_relation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_linear_relation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_simple_equality_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_baddbmm_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_non_strict_fake_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_non_strict_real_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_bincount_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_buffer_util_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_constructor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_constructor_torch_ir_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_wrong_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_ccode_python_mod_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cdist_forward_compute_mode_zero_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_check_specialized_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_checks_to_constrain_range_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cleanup_dynamic_markers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_colin_unbacked_backed_vr_sub_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_colon_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_compiling_state_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_access_identical_symint_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_branches_return_constant_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_branches_return_same_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_contains_unbacked_no_escape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_int_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_with_module_stack_export_with_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_with_module_stack_export_with_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_aliasing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_input_naming_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_no_user_inp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_output_dup_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_requires_grad_const_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_return_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_with_non_functional_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_with_non_functional_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_in_eager_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_with_constrain_value_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_with_various_cases_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_conv_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_crop_like_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cse_for_symint_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_functionalize_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_functionalize_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_warn_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_preserve_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_tag_metadata_re_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_batch_norm_functional_predispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_item_in_prim_after_decomposition_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_item_in_prim_before_decomposition_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_default_decomposition_core_cia_ops_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_1_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_integer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_repeat_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_simplified_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_repeat_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_nonstrict_with_stacktrace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_strict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_gpu_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_mutation_float_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_static_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_1_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_auto_and_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_divisibility_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_hint_range_violations_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_hint_ranges_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_disable_forced_specializations_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_disable_forced_specializations_ok_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_gather_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_gather_into_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_reduce_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_to_all_single_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_reduce_scatter_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dont_duck_size_for_auto_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_double_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_aliasing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_list_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_with_nan_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_fake_kernel_inference_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_infers_fake_kernel_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_duplicate_modules_with_non_persistent_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_lr_shift_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_bounds_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_dataclass_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_inferred_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_generic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_user_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_various_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_spec_with_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_wrapped_with_shape_guards_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_sym_round_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_ends_of_bounds_oblivious_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_enum_str_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_error_does_not_reference_eager_fallback_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_error_when_passing_mutating_primitive_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_exception_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_expand_copy_export_handles_implicit_true_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_api_with_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_as_backend_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_lifted_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_symbol_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_symbol_scandim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_subclass_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_symbool_pred_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_warns_constant_pred_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_decomp_table_basic_pop_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_decomp_table_container_methods_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_op_lib_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_triton_kernel_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_triton_kernel_mutable_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cyclic_reference_leak_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomp_torture_case_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomp_torture_case_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomps_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomps_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_dynamo_config_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_run_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_container_type_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_state_dict_hooks_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_default_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_keyword_only_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_pytree_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_keyword_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_keyword_pytree_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_postional_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_function_schema_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_graph_with_no_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_bug_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_dynamic_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_static_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_leak_compile_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_linear_preserve_dynamic_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_max_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_max_onnx_reported_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_mod_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_preserve_linear_at_aot_level_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_preserve_linear_but_not_custom_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_rnn_variants_with_warning_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_scan_pytree_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_script_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_statically_known_true_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_then_compile_tensor_ctor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_autocast_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_fake_tensor_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_inline_constraints_complex_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_inline_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_set_grad_enabled_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_wrong_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_external_call_non_strict_real_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fake_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fake_weights_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_filter_traceback_frames_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_flex_attention_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_float_conversion_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_float_conversion_from_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fqn_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_from_node_metadata_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_full_on_scalar_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_function_holding_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_hints_wrapper_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_hoo_inline_users_issue_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_if_functional_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_if_post_autograd_op_preserved_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inductor_backend_inside_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_class_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_class_method_recursive_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_int_shape_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_intermediate_shape_comp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_invalid_pytree_dynamo_graph_capture_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_exporting_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_nonzero_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_isnonzero_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_113041_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_157289_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_161902_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_istft_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_invalid_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_linear_convd_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_linear_convd_for_training_ir_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_kwarg_dynamic_shapes_diff_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_kwargs_reorder_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_layer_norm_unbacked_normalized_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_layer_sharing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_lazy_module_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_linear_conv_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_malformed_fqn_from_source_name_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_map_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_map_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mask_nonzero_static_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_masked_select_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_math_pow_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mismatched_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mixed_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_dict_key_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_input_subclasses_parameterization_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_list_slice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_with_dict_container_inp_out_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_modules_access_for_deleted_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_more_multidimensional_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multidimensional_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multinomial_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multiple_definitions_same_name_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_namedtuple_input_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_native_multi_attention_head_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_dynamic_shapes_spec_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_fake_tensor_leak_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_constant_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_init_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nn_module_stack_shared_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_check_is_size_error_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_suggested_fixes_for_data_dependent_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_3_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_persistent_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_strict_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_strict_dynamic_shapes_suggested_fixes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_none_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonstrict_retrace_preserves_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonzero_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonzero_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_not_registered_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_operator_aten_tensor_mode_variant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_output_node_name_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pad_sequence_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_param_util_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_partial_patched_forward_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_collisions_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_collisions_hoo_subgraphs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_order_variadic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_update_preserving_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_predispatch_cond_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_predispatch_grad_wrappers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_annotation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_module_call_signature_unflatten_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_requires_grad_placeholders_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_shape_dynamism_for_unused_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_profiling_code_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_python_asserts_with_sym_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pytree_register_data_class_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pytree_register_nested_data_class_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_range_constraints_with_replacement_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_alias_dtype_mismatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_bool_cast_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_errors_on_aliasing_custom_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_for_max_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_size_mismatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_redundant_assert_max_upper_bound_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_redundant_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_refine_dynamic_shapes_from_suggested_fixes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_register_constant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_repeat_interleave_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_replace_unbacked_with_very_large_upperbound_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_replaced_unbacked_bindings_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_reshape_view_helper_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_retracable_ep_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_retrace_pre_autograd_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decomposition_supports_user_input_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decompositions_keep_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decompositions_keep_tensor_constant_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_for_prim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_for_prm_str_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_with_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sdpa_gqa_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sequential_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_example_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_as_side_effect_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_empty_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_setgrad_lifted_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_shared_submodule_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_simple_export_for_training_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_simple_unbacked_view_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_size_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_slice_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_solver_unsupported_sympy_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_specialize_derived_dim_roots_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_split_const_gm_with_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_stack_trace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_stack_trace_make_fx_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_primitives_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_shape_attribute_assignment_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_tensors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_static_dim_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_context_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_complicated_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_const_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclasses_parameterization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclasses_parameterization_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggest_torch_checks_with_non_negative_check_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggest_torch_checks_with_regular_check_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_for_data_dependent_errors_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_new_roots_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_float_operators_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_or_sym_and_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_sqrt_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symbool_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symfloat_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_additional_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_ranges_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_shapes_collection_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_tensor_return_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tag_ac_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_attribute_zero_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_constant_aten_to_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_constant_with_wrapped_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_multiple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tolist_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_torch_check_eq_commutativity_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_torch_fn_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_trace_under_fake_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_train_eval_on_exported_preautograd_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tril_dynamic_diagonal_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_triu_dynamic_diagonal_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_3d_matmul_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_bincount_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_bindings_for_divisible_u_symint_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_deferred_runtime_retrace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_expand_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_infer_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_kth_value_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_linear_layer_norm_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_noncontig_lin_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_pad_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_scalar_constructor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_slice_forward_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_slice_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_to_cond_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_to_cond_passthrough_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_unsqueeze_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_buffer_update_child2parent_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_isinstance_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_shared_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_state_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_no_unroll_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_placeholder_update_child2parent_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_5_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_6_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_buf_8_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_const_preserving_3_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_const_preserving_3_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_6_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_9_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_preserving_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unused_aliases_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unused_constant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_uplift_common_custom_meta_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_uplift_common_custom_meta_with_multiple_calls_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_use_embedding_twice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_user_input_and_buffer_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_custom_autograd_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_to_assert_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_where_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_assert_separation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_index_assertions_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_tensor_constant_idx_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_wrapper_module_cpp_serdes
2025-12-04T15:07:32.8718731Z 
2025-12-04T15:07:32.8719034Z Finished export/test_cpp_serdes 1/1 ... [2025-12-04 15:07:32.835381][21324.844604295], took 1.40min
2025-12-04T15:07:32.8720212Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_cpp_serdes/export.test_cpp_serdes-72e11f38870e0d13.xml
2025-12-04T15:07:33.0023764Z Running inductor/test_control_flow 2/4 ... [2025-12-04 15:07:33.001930][21325.011152706]
2025-12-04T15:07:33.0024345Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:07:33.0027102Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:07:33.002294]
2025-12-04T15:18:46.8926987Z 
2025-12-04T15:18:46.8927872Z inductor/test_control_flow 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_2.4_3b4432ec9408add0_.log
2025-12-04T15:18:46.9073895Z Running 184 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_control_flow_with_precomputed_size, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_use_buffers_from_outer_scope, test/inductor/test_control_flow.py::CondTests::test_output_on_different_device, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cuda_dynamic_False, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_device_cuda, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_True_autograd_True
2025-12-04T15:18:46.9175559Z 
2025-12-04T15:18:46.9176189Z Finished inductor/test_control_flow 2/4 ... [2025-12-04 15:18:46.917303][21998.926518615], took 11.23min
2025-12-04T15:18:46.9393050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5ad0fee917746162.xml
2025-12-04T15:18:48.1425297Z Uploading artifacts took 1.12 seconds
2025-12-04T15:18:48.1428801Z Running test_sort_and_select 1/1 ... [2025-12-04 15:18:48.142611][22000.151834358]
2025-12-04T15:18:48.1429274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:18:48.1433685Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sort_and_select.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:48.143043]
2025-12-04T15:18:55.7722393Z 
2025-12-04T15:18:55.7723730Z test_sort_and_select 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sort_and_select_1.1_bec7fa88f7702fb0_.log
2025-12-04T15:18:55.7763934Z Running 111 items in this shard: test/test_sort_and_select.py::TestSortAndSelectCUDA::test_complex_unsupported_cpu_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_dtypes_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_kthvalue_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_kthvalue_scalar_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_output_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_discontiguous_slow_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_expanded_tensor_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_slice_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_restride_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_stable_none_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_1d_output_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_4d_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_arguments_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_lower_precision_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_lower_precision_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_noncontiguous_gpu_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_quantized_scalar_input_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_dim_cuda
2025-12-04T15:18:55.7803241Z 
2025-12-04T15:18:55.7803511Z Finished test_sort_and_select 1/1 ... [2025-12-04 15:18:55.772023][22007.781246429], took 0.13min
2025-12-04T15:18:55.7942017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sort_and_select/test_sort_and_select-049427debff60b53.xml
2025-12-04T15:18:55.9125681Z Running functorch/test_rearrange 1/1 ... [2025-12-04 15:18:55.912180][22007.921403]
2025-12-04T15:18:55.9126160Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:18:55.9129071Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_rearrange.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:55.912522]
2025-12-04T15:18:59.6338037Z 
2025-12-04T15:18:59.6339120Z functorch/test_rearrange 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_rearrange_1.1_a7b15b1a80eb0b56_.log
2025-12-04T15:18:59.6343236Z Running 10 items in this shard: test/functorch/test_rearrange.py::TestRearrange::test_0_dim_tensor, test/functorch/test_rearrange.py::TestRearrange::test_collapsed_ellipsis_errors_out, test/functorch/test_rearrange.py::TestRearrange::test_concatenations_and_stacking, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_no_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_with_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_ellipsis_ops, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_consistency, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_permutations, test/functorch/test_rearrange.py::TestRearrange::test_squeeze, test/functorch/test_rearrange.py::TestRearrange::test_unsqueeze
2025-12-04T15:18:59.6346712Z 
2025-12-04T15:18:59.6347019Z Finished functorch/test_rearrange 1/1 ... [2025-12-04 15:18:59.633461][22011.642685015], took 0.06min
2025-12-04T15:18:59.6553531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-cccd30d217a8d074.xml
2025-12-04T15:18:59.7015160Z Running test_package 1/1 ... [2025-12-04 15:18:59.701176][22011.710400256]
2025-12-04T15:18:59.7015593Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:18:59.7018367Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:59.701510]
2025-12-04T15:19:05.3265408Z 
2025-12-04T15:19:05.3266415Z test_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_package_1.1_f2ef9e9917fb97f5_.log
2025-12-04T15:19:05.3303913Z Running 137 items in this shard: test/test_package.py::TestAnalyze::test_trace_dependencies, test/test_package.py::TestDependencyAPI::test_allow_empty_with_error, test/test_package.py::TestDependencyAPI::test_broken_dependency, test/test_package.py::TestDependencyAPI::test_deny, test/test_package.py::TestDependencyAPI::test_deny_glob, test/test_package.py::TestDependencyAPI::test_extern, test/test_package.py::TestDependencyAPI::test_extern_glob, test/test_package.py::TestDependencyAPI::test_extern_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_externing_c_extension, test/test_package.py::TestDependencyAPI::test_implicit_intern, test/test_package.py::TestDependencyAPI::test_intern_error, test/test_package.py::TestDependencyAPI::test_invalid_import, test/test_package.py::TestDependencyAPI::test_mock, test/test_package.py::TestDependencyAPI::test_mock_glob, test/test_package.py::TestDependencyAPI::test_mock_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_pickle_mocked, test/test_package.py::TestDependencyAPI::test_pickle_mocked_all, test/test_package.py::TestDependencyAPI::test_repackage_mocked_module, test/test_package.py::TestDependencyHooks::test_extern_and_mock_hook, test/test_package.py::TestDependencyHooks::test_multiple_extern_hooks, test/test_package.py::TestDependencyHooks::test_multiple_mock_hooks, test/test_package.py::TestDependencyHooks::test_remove_hooks, test/test_package.py::TestDependencyHooks::test_single_hook, test/test_package.py::TestDiGraph::test_all_paths, test/test_package.py::TestDiGraph::test_contains, test/test_package.py::TestDiGraph::test_contains_non_hashable, test/test_package.py::TestDiGraph::test_edges, test/test_package.py::TestDiGraph::test_forward_closure, test/test_package.py::TestDiGraph::test_iter, test/test_package.py::TestDiGraph::test_node_attr_update, test/test_package.py::TestDiGraph::test_node_attrs, test/test_package.py::TestDiGraph::test_predecessor_not_in_graph, test/test_package.py::TestDiGraph::test_predecessors, test/test_package.py::TestDiGraph::test_successor_not_in_graph, test/test_package.py::TestDiGraph::test_successors, test/test_package.py::DirectoryReaderTest::test_importer_access, test/test_package.py::DirectoryReaderTest::test_loading_has_record, test/test_package.py::DirectoryReaderTest::test_loading_module, test/test_package.py::DirectoryReaderTest::test_loading_pickle, test/test_package.py::DirectoryReaderTest::test_package_resource_access, test/test_package.py::DirectoryReaderTest::test_resource_access_by_path, test/test_package.py::DirectoryReaderTest::test_resource_reader, test/test_package.py::DirectoryReaderTest::test_scriptobject_failure_message, test/test_package.py::TestGlobGroup::test_exclude, test/test_package.py::TestGlobGroup::test_exclude_from_all, test/test_package.py::TestGlobGroup::test_invalid_raw, test/test_package.py::TestGlobGroup::test_list_include_exclude, test/test_package.py::TestGlobGroup::test_one_star, test/test_package.py::TestGlobGroup::test_one_star_middle, test/test_package.py::TestGlobGroup::test_one_star_multiple_in_component, test/test_package.py::TestGlobGroup::test_one_star_partial, test/test_package.py::TestGlobGroup::test_one_star_partial_extension, test/test_package.py::TestGlobGroup::test_raw_two_star, test/test_package.py::TestGlobGroup::test_two_star, test/test_package.py::TestGlobGroup::test_two_star_end, test/test_package.py::TestGlobGroup::test_two_star_middle, test/test_package.py::TestGlobGroup::test_two_star_multiple, test/test_package.py::TestImporter::test_ordered_importer_basic, test/test_package.py::TestImporter::test_ordered_importer_whichmodule, test/test_package.py::TestImporter::test_package_importer_whichmodule_no_dunder_module, test/test_package.py::TestImporter::test_single_ordered_importer, test/test_package.py::TestImporter::test_sys_importer, test/test_package.py::TestImporter::test_sys_importer_roundtrip, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_fx_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_nn_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_torchscript_module, test/test_package.py::TestMangling::test_demangle_base, test/test_package.py::TestMangling::test_demangler_multiple_manglers, test/test_package.py::TestMangling::test_is_mangled, test/test_package.py::TestMangling::test_mangle_empty_errors, test/test_package.py::TestMangling::test_mangle_prefix, test/test_package.py::TestMangling::test_mangler_is_consistent, test/test_package.py::TestMangling::test_package_mangler, test/test_package.py::TestMangling::test_roundtrip_mangling, test/test_package.py::TestMangling::test_unique_manglers, test/test_package.py::TestMangling::test_unique_module_names, test/test_package.py::TestMisc::test_dunder_package_present, test/test_package.py::TestMisc::test_dunder_package_works_from_package, test/test_package.py::TestMisc::test_exporter_content_lists, test/test_package.py::TestMisc::test_file_structure, test/test_package.py::TestMisc::test_file_structure_has_file, test/test_package.py::TestMisc::test_inspect_class, test/test_package.py::TestMisc::test_is_from_package, test/test_package.py::TestMisc::test_load_python_version_from_package, test/test_package.py::TestMisc::test_loaders_that_remap_files_work_ok, test/test_package.py::TestMisc::test_python_version, test/test_package.py::TestMisc::test_std_lib_sys_hackery_checks, test/test_package.py::ModelTest::test_model_save, test/test_package.py::ModelTest::test_resnet, test/test_package.py::ModelTest::test_script_resnet, test/test_package.py::TestPackageFX::test_package_fx_custom_tracer, test/test_package.py::TestPackageFX::test_package_fx_package, test/test_package.py::TestPackageFX::test_package_fx_simple, test/test_package.py::TestPackageFX::test_package_fx_with_imports, test/test_package.py::TestPackageFX::test_package_fx_wrap, test/test_package.py::TestPackageFX::test_package_gm_preserve_stack_trace, test/test_package.py::TestPackageFX::test_package_then_fx, test/test_package.py::TestPackageScript::test_different_package_interface, test/test_package.py::TestPackageScript::test_different_package_script_class, test/test_package.py::TestPackageScript::test_load_shared_scriptmodules, test/test_package.py::TestPackageScript::test_load_shared_tensors, test/test_package.py::TestPackageScript::test_load_shared_tensors_repackaged, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules_shared_code, test/test_package.py::TestPackageScript::test_package_interface, test/test_package.py::TestPackageScript::test_package_script_class, test/test_package.py::TestPackageScript::test_package_script_class_referencing_self, test/test_package.py::TestPackageScript::test_save_eager_mods_sharing_scriptmodule, test/test_package.py::TestPackageScript::test_save_independent_scriptmodules, test/test_package.py::TestPackageScript::test_save_repeat_scriptmodules, test/test_package.py::TestPackageScript::test_save_scriptmodule, test/test_package.py::TestPackageScript::test_save_scriptmodule_file, test/test_package.py::TestPackageScript::test_save_scriptmodule_only_necessary_code, test/test_package.py::TestPackageScript::test_save_scriptmodule_with_submods, test/test_package.py::TestPackageScript::test_save_scriptmodules_in_container, test/test_package.py::TestPackageScript::test_save_scriptmodules_submod_redefinition, test/test_package.py::TestPackageScript::test_save_shared_tensors, test/test_package.py::TestPackageScript::test_saving_and_scripting_packaged_mod, test/test_package.py::TestPackageScript::test_scriptmodules_repeat_save, test/test_package.py::TestPackageScript::test_tensor_sharing_pickle, test/test_package.py::TestRepackage::test_repackage_import_indirectly_via_parent_module, test/test_package.py::TestResources::test_importer_access, test/test_package.py::TestResources::test_package_resource_access, test/test_package.py::TestResources::test_resource_access_by_path, test/test_package.py::TestResources::test_resource_reader, test/test_package.py::TestSaveLoad::test_bad_dunder_imports, test/test_package.py::TestSaveLoad::test_dunder_imports, test/test_package.py::TestSaveLoad::test_exporting_mismatched_code, test/test_package.py::TestSaveLoad::test_pickle, test/test_package.py::TestSaveLoad::test_pickle_long_name_with_protocol_4, test/test_package.py::TestSaveLoad::test_save_imported_module, test/test_package.py::TestSaveLoad::test_save_imported_module_using_package_importer, test/test_package.py::TestSaveLoad::test_save_load_fp8, test/test_package.py::TestSaveLoad::test_save_module, test/test_package.py::TestSaveLoad::test_save_module_binary, test/test_package.py::TestSaveLoad::test_saving_source, test/test_package.py::TestSaveLoad::test_saving_string
2025-12-04T15:19:05.3341345Z 
2025-12-04T15:19:05.3341592Z Finished test_package 1/1 ... [2025-12-04 15:19:05.326433][22017.335656914], took 0.09min
2025-12-04T15:19:05.3486816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_package/test_package-a2f65f799bf50b4a.xml
2025-12-04T15:19:05.4000932Z Running test_mkl_verbose 1/1 ... [2025-12-04 15:19:05.399743][22017.408968345]
2025-12-04T15:19:05.4001368Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:05.4003735Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkl_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:05.400040]
2025-12-04T15:19:13.0296894Z 
2025-12-04T15:19:13.0297774Z test_mkl_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkl_verbose_1.1_a8ab8be9a564b785_.log
2025-12-04T15:19:13.0299353Z Running 2 items in this shard: test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_off, test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_on
2025-12-04T15:19:13.0300086Z 
2025-12-04T15:19:13.0300360Z Finished test_mkl_verbose 1/1 ... [2025-12-04 15:19:13.029397][22025.038622174], took 0.13min
2025-12-04T15:19:13.0516391Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-c19a0c4320bf6e65.xml
2025-12-04T15:19:13.1348489Z Running test_utils_config_module 1/1 ... [2025-12-04 15:19:13.134471][22025.143694431]
2025-12-04T15:19:13.1348977Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:13.1351880Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_utils_config_module.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:13.134795]
2025-12-04T15:19:16.8558248Z 
2025-12-04T15:19:16.8559236Z test_utils_config_module 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_config_module_1.1_aa22a3cb4155f80d_.log
2025-12-04T15:19:16.8567018Z Running 22 items in this shard: test/test_utils_config_module.py::TestConfigModule::test_alias, test/test_utils_config_module.py::TestConfigModule::test_bad_jk_type, test/test_utils_config_module.py::TestConfigModule::test_base_value_loading, test/test_utils_config_module.py::TestConfigModule::test_codegen_config, test/test_utils_config_module.py::TestConfigModule::test_codegen_config_function, test/test_utils_config_module.py::TestConfigModule::test_dict_copy_semantics, test/test_utils_config_module.py::TestConfigModule::test_env_name_semantics, test/test_utils_config_module.py::TestConfigModule::test_env_name_string_semantics, test/test_utils_config_module.py::TestConfigModule::test_get_hash, test/test_utils_config_module.py::TestConfigModule::test_invalid_config_float, test/test_utils_config_module.py::TestConfigModule::test_invalid_config_int, test/test_utils_config_module.py::TestConfigModule::test_make_closur_patcher, test/test_utils_config_module.py::TestConfigModule::test_multi_env, test/test_utils_config_module.py::TestConfigModule::test_none_override_semantics, test/test_utils_config_module.py::TestConfigModule::test_overrides, test/test_utils_config_module.py::TestConfigModule::test_patch, test/test_utils_config_module.py::TestConfigModule::test_reference_is_default, test/test_utils_config_module.py::TestConfigModule::test_reference_semantics, test/test_utils_config_module.py::TestConfigModule::test_save_config, test/test_utils_config_module.py::TestConfigModule::test_save_config_portable, test/test_utils_config_module.py::TestConfigModule::test_type_loading, test/test_utils_config_module.py::TestConfigModule::test_unittest_patch
2025-12-04T15:19:16.8574245Z 
2025-12-04T15:19:16.8574535Z Finished test_utils_config_module 1/1 ... [2025-12-04 15:19:16.855501][22028.864725932], took 0.06min
2025-12-04T15:19:16.8778208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_utils_config_module/test_utils_config_module-cd73bdff208ab311.xml
2025-12-04T15:19:16.9111410Z Running test_hop_infra 1/1 ... [2025-12-04 15:19:16.910833][22028.920057255]
2025-12-04T15:19:16.9111857Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:16.9115066Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hop_infra.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:16.911139]
2025-12-04T15:19:21.0329109Z 
2025-12-04T15:19:21.0329908Z test_hop_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hop_infra_1.1_f77bb32afa422f2e_.log
2025-12-04T15:19:21.0331569Z Running 3 items in this shard: test/test_hop_infra.py::TestHOPInfra::test_all_hops_are_imported, test/test_hop_infra.py::TestHOPInfra::test_all_hops_have_opinfo, test/test_hop_infra.py::TestHOPInfra::test_imports_from_all_work
2025-12-04T15:19:21.0332525Z 
2025-12-04T15:19:21.0332775Z Finished test_hop_infra 1/1 ... [2025-12-04 15:19:21.032563][22033.041787652], took 0.07min
2025-12-04T15:19:21.0549096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_hop_infra/test_hop_infra-d1efcb546b726ee3.xml
2025-12-04T15:19:21.0905091Z Running test_appending_byte_serializer 1/1 ... [2025-12-04 15:19:21.090173][22033.099397978]
2025-12-04T15:19:21.0905624Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:21.0908361Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_appending_byte_serializer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:21.090474]
2025-12-04T15:19:24.7623793Z 
2025-12-04T15:19:24.7625040Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_7e52ee648e02aa85_.log
2025-12-04T15:19:24.7627068Z Running 3 items in this shard: test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_checksum, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_class, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_int
2025-12-04T15:19:24.7628471Z 
2025-12-04T15:19:24.7628824Z Finished test_appending_byte_serializer 1/1 ... [2025-12-04 15:19:24.761916][22036.771140244], took 0.06min
2025-12-04T15:19:24.7845629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_appending_byte_serializer/test_appending_byte_serializer-db1af3fc87bd6240.xml
2025-12-04T15:19:24.8253623Z Running test_ao_sparsity 1/1 ... [2025-12-04 15:19:24.824995][22036.834220217]
2025-12-04T15:19:24.8254227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:24.8256861Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ao_sparsity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:24.825301]
2025-12-04T15:19:36.9104543Z 
2025-12-04T15:19:36.9105532Z test_ao_sparsity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ao_sparsity_1.1_c127cba34d71d100_.log
2025-12-04T15:19:36.9134779Z Running 88 items in this shard: test/test_ao_sparsity.py::TestQuantizedSparseKernels::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear_serdes, test/test_ao_sparsity.py::TestFakeSparsity::test_jit_trace, test/test_ao_sparsity.py::TestFakeSparsity::test_masking_logic, test/test_ao_sparsity.py::TestFakeSparsity::test_state_dict_preserved, test/test_ao_sparsity.py::TestFakeSparsity::test_weights_parametrized, test/test_ao_sparsity.py::TestCubicScheduler::test_constructor, test/test_ao_sparsity.py::TestCubicScheduler::test_step, test/test_ao_sparsity.py::TestScheduler::test_constructor, test/test_ao_sparsity.py::TestScheduler::test_lambda_scheduler, test/test_ao_sparsity.py::TestScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestScheduler::test_step, test/test_ao_sparsity.py::TestBaseSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseSparsifier::test_convert, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params1, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params2, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params3, test/test_ao_sparsity.py::TestBaseSparsifier::test_prepare_config, test/test_ao_sparsity.py::TestBaseSparsifier::test_state_dict, test/test_ao_sparsity.py::TestBaseSparsifier::test_step, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_constructor, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_prepare, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_constructor, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_prepare, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step_2_of_4, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_complex_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_activation_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_bias_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_padding_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_pool_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_activation_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_bias_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_linear, test/test_ao_sparsity.py::TestFPGMPruner::test_compute_distance, test/test_ao_sparsity.py::TestFPGMPruner::test_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_lstm_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestComposability::test_convert_without_squash_mask, test/test_ao_sparsity.py::TestComposability::test_fusion_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_q_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_qat_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_fusion, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_q_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_qat_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_before_s_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_s_prep_ref_conv, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_q_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_qat_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_q_prep_fx_ref, test/test_ao_sparsity.py::TestActivationSparsifier::test_activation_sparsifier, test/test_ao_sparsity.py::TestBaseDataScheduler::test_constructor, test/test_ao_sparsity.py::TestBaseDataScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestBaseDataScheduler::test_state_dict, test/test_ao_sparsity.py::TestBaseDataScheduler::test_step, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_embeddings, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_parameters, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_tensors, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_embeddings, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_parameters, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_tensors, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_quantize_first, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_sparsify_first, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_for_tensors, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_root
2025-12-04T15:19:36.9162986Z 
2025-12-04T15:19:36.9163242Z Finished test_ao_sparsity 1/1 ... [2025-12-04 15:19:36.910131][22048.919355796], took 0.20min
2025-12-04T15:19:36.9332262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ao_sparsity/test_ao_sparsity-47b60e8cb29a5ef6.xml
2025-12-04T15:19:37.0115810Z Running test_extension_utils 1/1 ... [2025-12-04 15:19:37.011248][22049.020471037]
2025-12-04T15:19:37.0116275Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:37.0119391Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_extension_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:37.011554]
2025-12-04T15:19:40.6324735Z 
2025-12-04T15:19:40.6325572Z test_extension_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_extension_utils_1.1_7f66e708b7c7a8bc_.log
2025-12-04T15:19:40.6327057Z Running 2 items in this shard: test/test_extension_utils.py::TestExtensionUtils::test_external_module_register, test/test_extension_utils.py::TestExtensionUtils::test_external_module_register_with_renamed_backend
2025-12-04T15:19:40.6328003Z 
2025-12-04T15:19:40.6328278Z Finished test_extension_utils 1/1 ... [2025-12-04 15:19:40.632156][22052.641379694], took 0.06min
2025-12-04T15:19:40.6564522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_extension_utils/test_extension_utils-5e3baa267a09a3bb.xml
2025-12-04T15:19:40.6866587Z Running nn/attention/test_fa4 1/1 ... [2025-12-04 15:19:40.686191][22052.69541583]
2025-12-04T15:19:40.6867042Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:40.6868509Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/attention/test_fa4.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:40.686492]
2025-12-04T15:19:44.6091035Z 
2025-12-04T15:19:44.6091900Z nn/attention/test_fa4 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.attention.test_fa4_1.1_59632c9893caec1b_.log
2025-12-04T15:19:44.6139879Z Running 66 items in this shard: test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_fa4_kernel_called_bfloat16_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_fa4_kernel_called_float16_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_float16
2025-12-04T15:19:44.6185877Z 
2025-12-04T15:19:44.6186188Z Finished nn/attention/test_fa4 1/1 ... [2025-12-04 15:19:44.608688][22056.617911862], took 0.07min
2025-12-04T15:19:44.6332598Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.attention.test_fa4/nn.attention.test_fa4-2d55ad78ccee943a.xml
2025-12-04T15:19:44.6688000Z Running typing/test_python_operators 1/1 ... [2025-12-04 15:19:44.668377][22056.677602487]
2025-12-04T15:19:44.6688510Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:44.6691123Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'typing/test_python_operators.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:44.668719]
2025-12-04T15:19:48.9418842Z 
2025-12-04T15:19:48.9419981Z typing/test_python_operators 1/1 was successful, full logs can be found in artifacts with path test/test-reports/typing.test_python_operators_1.1_1dbf7db937cf8b4b_.log
2025-12-04T15:19:48.9534707Z Running 318 items in this shard: test/typing/test_python_operators.py::TestPythonOperators::test_binary_a100_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a101_op_%_b101, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a102_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a103_op_%_b103, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a104_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a105_op_*_b105, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a106_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a107_op_*_b107, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a108_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a109_op_**_b109, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a110_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a111_op_**_b111, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a112_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a113_op_+_b113, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a114_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a115_op_+_b115, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a116_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a117_op_-_b117, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a118_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a119_op_-_b119, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a120_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a121_op_/_b121, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a122_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a123_op_/_b123, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a124_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a125_op_//_b125, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a126_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a127_op_//_b127, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a128_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a129_op_&_b129, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a130_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a131_op_&_b131, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a132_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a133_op_<<_b133, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a134_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a135_op_<<_b135, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a136_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a137_op_>>_b137, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a138_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a139_op_>>_b139, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a140_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a141_op_^_b141, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a142_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a143_op_^_b143, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a144_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a145_op_|_b145, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a146_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a147_op_|_b147, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a148_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a149_op_@_b149, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a150_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a151_op_@_b151, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a228_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a229_op_!=_b229, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a230_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a231_op_!=_b231, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a232_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a233_op_<_b233, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a234_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a235_op_<_b235, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a236_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a237_op_<=_b237, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a238_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a239_op_<=_b239, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a240_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a241_op_==_b241, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a242_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a243_op_==_b243, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a244_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a245_op_>_b245, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a246_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a247_op_>_b247, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a248_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a249_op_>=_b249, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a250_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a251_op_>=_b251, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a252_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a253_op_%_b253, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a254_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a255_op_%_b255, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a256_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a257_op_*_b257, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a258_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a259_op_*_b259, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a260_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a261_op_**_b261, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a262_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a263_op_**_b263, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a264_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a265_op_+_b265, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a266_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a267_op_+_b267, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a268_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a269_op_-_b269, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a270_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a271_op_-_b271, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a272_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a273_op_/_b273, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a274_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a275_op_/_b275, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a276_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a277_op_//_b277, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a278_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a279_op_//_b279, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a280_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a281_op_&_b281, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a282_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a283_op_&_b283, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a284_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a285_op_<<_b285, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a286_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a287_op_<<_b287, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a288_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a289_op_>>_b289, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a290_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a291_op_>>_b291, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a292_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a293_op_^_b293, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a294_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a295_op_^_b295, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a296_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a297_op_|_b297, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a298_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a299_op_|_b299, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a300_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a301_op_@_b301, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a302_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a303_op_@_b303, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a76_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a77_op_!=_b77, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a78_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a79_op_!=_b79, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a80_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a81_op_<_b81, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a82_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a83_op_<_b83, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a84_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a85_op_<=_b85, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a86_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a87_op_<=_b87, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a88_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a89_op_==_b89, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a90_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a91_op_==_b91, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a92_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a93_op_>_b93, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a94_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a95_op_>_b95, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a96_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a97_op_>=_b97, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a98_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a99_op_>=_b99, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b1, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b25, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b27, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b53, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b55, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b33, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b35, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b29, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b31, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b37, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b39, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b41, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b43, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b49, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b51, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b45, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b47, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b57, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b59, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b11, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b9, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b7, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b13, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b15, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b21, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b23, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b61, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b63, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b17, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b19, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b73, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b75, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b65, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b67, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b69, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b71, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b153, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b155, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b177, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b179, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b205, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b207, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b185, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b187, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b181, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b183, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b189, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b191, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b193, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b195, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b201, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b203, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b197, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b199, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b209, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b211, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b161, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b163, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b157, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b159, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b165, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b167, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b173, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b175, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b213, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b215, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b169, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b171, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b225, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b227, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b217, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b219, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b221, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b223, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_operators_are_correct_and_complete, test/typing/test_python_operators.py::TestPythonOperators::test_type_tests_are_complete, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a1, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a7, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a11, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a9, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_3
2025-12-04T15:19:48.9647746Z 
2025-12-04T15:19:48.9648174Z Finished typing/test_python_operators 1/1 ... [2025-12-04 15:19:48.941895][22060.951117554], took 0.07min
2025-12-04T15:19:48.9651066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/typing.test_python_operators/typing.test_python_operators-7b01e9f4c56696ce.xml
2025-12-04T15:19:49.0020205Z Running torch_np/test_dtype 1/1 ... [2025-12-04 15:19:49.001572][22061.010797149]
2025-12-04T15:19:49.0020893Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:49.0022183Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:49.001875]
2025-12-04T15:19:52.9244807Z 
2025-12-04T15:19:52.9245938Z torch_np/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_dtype_1.1_8ba7a24ba508317e_.log
2025-12-04T15:19:52.9268566Z Running 44 items in this shard: test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_bool, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.bool_, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex128, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.dtype('bool'), test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int8, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint8
2025-12-04T15:19:52.9299071Z 
2025-12-04T15:19:52.9299479Z Finished torch_np/test_dtype 1/1 ... [2025-12-04 15:19:52.924168][22064.933392502], took 0.07min
2025-12-04T15:19:52.9501036Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_dtype/torch_np.test_dtype-50c590a3e827391c.xml
2025-12-04T15:19:53.0386002Z Running test_file_check 1/1 ... [2025-12-04 15:19:53.038230][22065.04745425]
2025-12-04T15:19:53.0386632Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:53.0389380Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_file_check.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:53.038545]
2025-12-04T15:19:58.3632107Z 
2025-12-04T15:19:58.3633128Z test_file_check 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_file_check_1.1_e6044214ffdb04bb_.log
2025-12-04T15:19:58.3634244Z Running 2 items in this shard: test/test_file_check.py::TestFileCheck::test_all_python_api, test/test_file_check.py::TestFileCheck::test_not_run
2025-12-04T15:19:58.3636012Z 
2025-12-04T15:19:58.3636270Z Finished test_file_check 1/1 ... [2025-12-04 15:19:58.362811][22070.372035669], took 0.09min
2025-12-04T15:19:58.3866933Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_file_check/test_file_check-c5f916d4f839abe2.xml
2025-12-04T15:19:58.4181199Z Running profiler/test_kineto 1/1 ... [2025-12-04 15:19:58.417733][22070.426957839]
2025-12-04T15:19:58.4182220Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:19:58.4184087Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_kineto.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:58.418042]
2025-12-04T15:20:13.5105352Z 
2025-12-04T15:20:13.5106955Z profiler/test_kineto 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_kineto_1.1_3901a608b259f0c8_.log
2025-12-04T15:20:13.5109156Z Running 1 items in this shard: test/profiler/test_kineto.py::SimpleKinetoInitializationTest::test_kineto_profiler_with_environment_variable
2025-12-04T15:20:13.5110170Z 
2025-12-04T15:20:13.5110627Z Finished profiler/test_kineto 1/1 ... [2025-12-04 15:20:13.510191][22085.519416141], took 0.25min
2025-12-04T15:20:13.5342244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_kineto/profiler.test_kineto-1437f02ea71dbd19.xml
2025-12-04T15:20:13.6772492Z Running functorch/test_ac_knapsack 1/1 ... [2025-12-04 15:20:13.676840][22085.686063825]
2025-12-04T15:20:13.6773263Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
﻿2025-12-04T15:20:13.6777381Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac_knapsack.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:13.677164]
2025-12-04T15:20:17.5493759Z 
2025-12-04T15:20:17.5494843Z functorch/test_ac_knapsack 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_knapsack_1.1_a4a52ea27bf21bce_.log
2025-12-04T15:20:17.5503593Z Running 17 items in this shard: test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_full_joint_nx_graph, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_knapsack_memory_input, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_knapsack_runtime_input, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_non_ac_peak_memory, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_theoretical_max_runtime, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_inialize_from_graph, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_recomputable_node_only_graph, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_recomputable_node_only_graph_with_larger_graph_context, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_simplified_fx_joint_graph, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_distribution_of_results_for_knapsack_algo, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_knapsack_output_accounting_for_backward_pass, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_knapsack_output_not_accounting_for_backward_pass, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_knapsack_output_with_wrong_sized_values, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_get_backward_memory_from_topologically_sorted_graph, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_get_knee_point_memory_budget, test/functorch/test_ac_knapsack.py::TestActivationCheckpointingKnapsack::test_dp_knapsack, test/functorch/test_ac_knapsack.py::TestActivationCheckpointingKnapsack::test_dp_knapsack_sliding_hirschberg
2025-12-04T15:20:17.5511920Z 
2025-12-04T15:20:17.5512333Z Finished functorch/test_ac_knapsack 1/1 ... [2025-12-04 15:20:17.549011][22089.558235709], took 0.06min
2025-12-04T15:20:17.5727489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ac_knapsack/functorch.test_ac_knapsack-a2f3dae1f99bc885.xml
2025-12-04T15:20:17.6198327Z Running torch_np/test_nep50_examples 1/1 ... [2025-12-04 15:20:17.619460][22089.628685653]
2025-12-04T15:20:17.6198830Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:20:17.6201558Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_nep50_examples.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:17.619755]
2025-12-04T15:20:22.8435000Z 
2025-12-04T15:20:22.8436097Z torch_np/test_nep50_examples 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_nep50_examples_1.1_be93e5fc5572125c_.log
2025-12-04T15:20:22.9185140Z Running 1573 items in this shard: test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_3j + array(3, complex64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_True + uint8(2), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array(1_0, float32) + 1e-14 == 1_0, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([0_1], float32) == float64(0_1), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([100], uint8) + 200, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 1, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 200, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 300, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + array(1, int64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + int64(1), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_0], float32) + 1e-14 == 1_0, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + 3, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + array(1_, float64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + float64(1_), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + int64(3), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_bool_(True) + 1, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(1) + 1j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(1) + 3e100, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(5) + 5j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int16(2) + 2, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int16(4) + 4j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int32(1) + 5j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(1) + 2, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(1) + 300, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(100) + 200, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar27_array27_dtype27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar28_array28_dtype28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar29_array29_dtype29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar30_array30_dtype30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar31_array31_dtype31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar32_array32_dtype32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar33_array33_dtype33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar34_array34_dtype34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar35_array35_dtype35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array10_dtype10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array11_dtype11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array12_dtype12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array13_dtype13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array14_dtype14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array15_dtype15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array16_dtype16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array17_dtype17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array9_dtype9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array18_dtype18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array19_dtype19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array20_dtype20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array21_dtype21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array22_dtype22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array23_dtype23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array24_dtype24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array25_dtype25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array26_dtype26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array0_dtype0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array1_dtype1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array2_dtype2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array3_dtype3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array4_dtype4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array5_dtype5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array6_dtype6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array7_dtype7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array8_dtype8
2025-12-04T15:20:22.9909423Z 
2025-12-04T15:20:22.9909750Z Finished torch_np/test_nep50_examples 1/1 ... [2025-12-04 15:20:22.846271][22094.855494381], took 0.09min
2025-12-04T15:20:22.9910888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_nep50_examples/torch_np.test_nep50_examples-87e42828c2fde829.xml
2025-12-04T15:20:22.9911888Z Running test_torch 1/1 ... [2025-12-04 15:20:22.955177][22094.964399308]
2025-12-04T15:20:22.9912291Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:20:22.9913269Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_torch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:22.955502]
2025-12-04T15:21:47.2122406Z 
2025-12-04T15:21:47.2123267Z test_torch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_torch_1.1_ed3627b67cdc077e_.log
2025-12-04T15:21:47.2449776Z Running 976 items in this shard: test/test_torch.py::TestBasicVitalSigns::test_basic_vitals, test/test_torch.py::TestBasicVitalSigns::test_basic_vitals_read_write, test/test_torch.py::TestBasicVitalSigns::test_dataloader_vitals, test/test_torch.py::TestTorch::test_RNGState, test/test_torch.py::TestTorch::test_RNGStateAliasing, test/test_torch.py::TestTorch::test_RNG_after_pickle, test/test_torch.py::TestTorch::test_Size, test/test_torch.py::TestTorch::test_Size_concat_non_tuple_sequence, test/test_torch.py::TestTorch::test_Size_concat_wildcard, test/test_torch.py::TestTorch::test_Size_iter, test/test_torch.py::TestTorch::test_Size_scalar, test/test_torch.py::TestTorch::test_add_meta_scalar, test/test_torch.py::TestTorch::test_allow_tensor_metadata_change, test/test_torch.py::TestTorch::test_apply, test/test_torch.py::TestTorch::test_as_subclass, test/test_torch.py::TestTorch::test_assert_async, test/test_torch.py::TestTorch::test_backward_hooks_traverse, test/test_torch.py::TestTorch::test_batch_norm_cpu_inference, test/test_torch.py::TestTorch::test_bf16_supported_on_cpu, test/test_torch.py::TestTorch::test_bmm_multithreaded, test/test_torch.py::TestTorch::test_boxMullerState, test/test_torch.py::TestTorch::test_cat_neg_dim, test/test_torch.py::TestTorch::test_check, test/test_torch.py::TestTorch::test_chunk_neg_dim, test/test_torch.py::TestTorch::test_conj_neg_tolist, test/test_torch.py::TestTorch::test_conj_physical_meta_stride, test/test_torch.py::TestTorch::test_contains, test/test_torch.py::TestTorch::test_copy_broadcast, test/test_torch.py::TestTorch::test_copy_dtypes, test/test_torch.py::TestTorch::test_copy_float16, test/test_torch.py::TestTorch::test_copy_many_to_one, test/test_torch.py::TestTorch::test_copy_transpose, test/test_torch.py::TestTorch::test_cuda_not_built, test/test_torch.py::TestTorch::test_cummax_neg_dim, test/test_torch.py::TestTorch::test_cummin_neg_dim, test/test_torch.py::TestTorch::test_cumprod_neg_dim, test/test_torch.py::TestTorch::test_cumsum_neg_dim, test/test_torch.py::TestTorch::test_cxx_flags, test/test_torch.py::TestTorch::test_data_ptr_of_empty_tensor_with_storage, test/test_torch.py::TestTorch::test_data_ptr_of_empty_view_with_storage, test/test_torch.py::TestTorch::test_deepcopy_gradient, test/test_torch.py::TestTorch::test_deepcopy_parameter, test/test_torch.py::TestTorch::test_deterministic_fill_uninitialized_memory, test/test_torch.py::TestTorch::test_deterministic_flag, test/test_torch.py::TestTorch::test_device, test/test_torch.py::TestTorch::test_dim_order, test/test_torch.py::TestTorch::test_dir, test/test_torch.py::TestTorch::test_doc, test/test_torch.py::TestTorch::test_doc_template, test/test_torch.py::TestTorch::test_dot_data_use, test/test_torch.py::TestTorch::test_dtype_is_signed, test/test_torch.py::TestTorch::test_element_size, test/test_torch.py::TestTorch::test_empty_meta, test/test_torch.py::TestTorch::test_empty_storage_view, test/test_torch.py::TestTorch::test_equal, test/test_torch.py::TestTorch::test_error_msg_type_translation, test/test_torch.py::TestTorch::test_fill_diagonal, test/test_torch.py::TestTorch::test_format_scalar_meta, test/test_torch.py::TestTorch::test_from_buffer, test/test_torch.py::TestTorch::test_from_file, test/test_torch.py::TestTorch::test_gather_neg_dim, test/test_torch.py::TestTorch::test_generator_cpu, test/test_torch.py::TestTorch::test_get_cpu_capability, test/test_torch.py::TestTorch::test_has_internal_overlap, test/test_torch.py::TestTorch::test_has_storage, test/test_torch.py::TestTorch::test_index_add, test/test_torch.py::TestTorch::test_index_add_all_dtypes, test/test_torch.py::TestTorch::test_index_add_cornercase, test/test_torch.py::TestTorch::test_index_add_correctness, test/test_torch.py::TestTorch::test_index_add_neg_dim, test/test_torch.py::TestTorch::test_index_copy_neg_dim, test/test_torch.py::TestTorch::test_index_fill_neg_dim, test/test_torch.py::TestTorch::test_index_select_neg_dim, test/test_torch.py::TestTorch::test_invalid_arg_error_handling, test/test_torch.py::TestTorch::test_invalid_generator_raises, test/test_torch.py::TestTorch::test_is_nonzero, test/test_torch.py::TestTorch::test_is_same_size, test/test_torch.py::TestTorch::test_iter, test/test_torch.py::TestTorch::test_kthvalue_neg_dim, test/test_torch.py::TestTorch::test_linspace_logspace, test/test_torch.py::TestTorch::test_logcumsumexp_neg_dim, test/test_torch.py::TestTorch::test_manual_seed, test/test_torch.py::TestTorch::test_map, test/test_torch.py::TestTorch::test_map2, test/test_torch.py::TestTorch::test_max_neg_dim, test/test_torch.py::TestTorch::test_mean_neg_dim, test/test_torch.py::TestTorch::test_median_neg_dim, test/test_torch.py::TestTorch::test_memory_format, test/test_torch.py::TestTorch::test_memory_format_contiguous_returns_same_tensor_if_already_satisfies, test/test_torch.py::TestTorch::test_memory_format_empty, test/test_torch.py::TestTorch::test_min_neg_dim, test/test_torch.py::TestTorch::test_mode_neg_dim, test/test_torch.py::TestTorch::test_multinomial_invalid_probs, test/test_torch.py::TestTorch::test_nanmedian_neg_dim, test/test_torch.py::TestTorch::test_narrow_neg_dim, test/test_torch.py::TestTorch::test_nbytes, test/test_torch.py::TestTorch::test_ndim, test/test_torch.py::TestTorch::test_new, test/test_torch.py::TestTorch::test_newaxis_numpy_comparison, test/test_torch.py::TestTorch::test_newindex, test/test_torch.py::TestTorch::test_no_cuda_monkeypatch, test/test_torch.py::TestTorch::test_norm_neg_dim, test/test_torch.py::TestTorch::test_normal_shape, test/test_torch.py::TestTorch::test_numel, test/test_torch.py::TestTorch::test_parallel_info, test/test_torch.py::TestTorch::test_parsing_double, test/test_torch.py::TestTorch::test_parsing_int64, test/test_torch.py::TestTorch::test_parsing_intlist, test/test_torch.py::TestTorch::test_permute, test/test_torch.py::TestTorch::test_pickle, test/test_torch.py::TestTorch::test_pickle_dtype, test/test_torch.py::TestTorch::test_pickle_function, test/test_torch.py::TestTorch::test_pickle_generator, test/test_torch.py::TestTorch::test_pickle_parameter, test/test_torch.py::TestTorch::test_pickle_parameter_no_requires_grad, test/test_torch.py::TestTorch::test_pickle_size, test/test_torch.py::TestTorch::test_pin_memory, test/test_torch.py::TestTorch::test_print, test/test_torch.py::TestTorch::test_prod_neg_dim, test/test_torch.py::TestTorch::test_pyobj_preserved, test/test_torch.py::TestTorch::test_qengine, test/test_torch.py::TestTorch::test_renorm_neg_dim, test/test_torch.py::TestTorch::test_resizable, test/test_torch.py::TestTorch::test_reversed, test/test_torch.py::TestTorch::test_scatter_neg_dim, test/test_torch.py::TestTorch::test_select_neg_dim, test/test_torch.py::TestTorch::test_set_flush_denormal, test/test_torch.py::TestTorch::test_setting_real_imag_to_a_number, test/test_torch.py::TestTorch::test_show_config, test/test_torch.py::TestTorch::test_size_neg_dim, test/test_torch.py::TestTorch::test_size_stride, test/test_torch.py::TestTorch::test_sizeof, test/test_torch.py::TestTorch::test_slice, test/test_torch.py::TestTorch::test_slow_test, test/test_torch.py::TestTorch::test_sobolengine_bounds, test/test_torch.py::TestTorch::test_sobolengine_bounds_scrambled, test/test_torch.py::TestTorch::test_sobolengine_continuing, test/test_torch.py::TestTorch::test_sobolengine_continuing_scrambled, test/test_torch.py::TestTorch::test_sobolengine_default_dtype, test/test_torch.py::TestTorch::test_sobolengine_distribution, test/test_torch.py::TestTorch::test_sobolengine_distribution_scrambled, test/test_torch.py::TestTorch::test_sobolengine_draw, test/test_torch.py::TestTorch::test_sobolengine_draw_base2, test/test_torch.py::TestTorch::test_sobolengine_draw_base2_scrambled, test/test_torch.py::TestTorch::test_sobolengine_draw_scrambled, test/test_torch.py::TestTorch::test_sobolengine_fast_forward, test/test_torch.py::TestTorch::test_sobolengine_fast_forward_scrambled, test/test_torch.py::TestTorch::test_sobolengine_first_point, test/test_torch.py::TestTorch::test_sobolengine_high_dim, test/test_torch.py::TestTorch::test_sobolengine_raise, test/test_torch.py::TestTorch::test_sobolengine_reset, test/test_torch.py::TestTorch::test_sobolengine_reset_scrambled, test/test_torch.py::TestTorch::test_sort_neg_dim, test/test_torch.py::TestTorch::test_split_neg_dim, test/test_torch.py::TestTorch::test_split_with_sizes_copy_out, test/test_torch.py::TestTorch::test_squeeze_neg_dim, test/test_torch.py::TestTorch::test_std_neg_dim, test/test_torch.py::TestTorch::test_storage_base_init, test/test_torch.py::TestTorch::test_storage_base_new, test/test_torch.py::TestTorch::test_storage_byteswap, test/test_torch.py::TestTorch::test_storage_casts, test/test_torch.py::TestTorch::test_storage_cycle_via_dict, test/test_torch.py::TestTorch::test_storage_cycle_via_slots, test/test_torch.py::TestTorch::test_storage_dead_weak_ref, test/test_torch.py::TestTorch::test_storage_dealloc, test/test_torch.py::TestTorch::test_storage_dealloc_resurrected, test/test_torch.py::TestTorch::test_storage_dealloc_subclass_resurrected, test/test_torch.py::TestTorch::test_storage_dealloc_subclass_zombie, test/test_torch.py::TestTorch::test_storage_dict_dealloc, test/test_torch.py::TestTorch::test_storage_error, test/test_torch.py::TestTorch::test_storage_error_no_attribute, test/test_torch.py::TestTorch::test_storage_finalizer_dealloc, test/test_torch.py::TestTorch::test_storage_fix_weakref_no_leak, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc_resurrected, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc_zombie, test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context, test/test_torch.py::TestTorch::test_storage_resurrected_weak_ref, test/test_torch.py::TestTorch::test_storage_slot_dealloc, test/test_torch.py::TestTorch::test_storage_thread_safety, test/test_torch.py::TestTorch::test_storage_weakref_dealloc, test/test_torch.py::TestTorch::test_structseq_repr, test/test_torch.py::TestTorch::test_subclass_preserved, test/test_torch.py::TestTorch::test_subclass_tensors, test/test_torch.py::TestTorch::test_sum_neg_dim, test/test_torch.py::TestTorch::test_swap_basic, test/test_torch.py::TestTorch::test_swap_fail_slots, test/test_torch.py::TestTorch::test_t_not_2d_error, test/test_torch.py::TestTorch::test_tensor_base_init, test/test_torch.py::TestTorch::test_tensor_base_new, test/test_torch.py::TestTorch::test_tensor_ctor_scalar, test/test_torch.py::TestTorch::test_tensor_cycle_via_dict, test/test_torch.py::TestTorch::test_tensor_cycle_via_slots, test/test_torch.py::TestTorch::test_tensor_dead_weak_ref, test/test_torch.py::TestTorch::test_tensor_dict_dealloc, test/test_torch.py::TestTorch::test_tensor_finalizer_dealloc, test/test_torch.py::TestTorch::test_tensor_fix_weakref_no_leak, test/test_torch.py::TestTorch::test_tensor_item_no_warning, test/test_torch.py::TestTorch::test_tensor_ressurecting_clear, test/test_torch.py::TestTorch::test_tensor_resurrected_weak_ref, test/test_torch.py::TestTorch::test_tensor_set, test/test_torch.py::TestTorch::test_tensor_set_errors, test/test_torch.py::TestTorch::test_tensor_slot_dealloc, test/test_torch.py::TestTorch::test_tensor_weakref_dealloc, test/test_torch.py::TestTorch::test_tensor_where_scalar, test/test_torch.py::TestTorch::test_tensor_with_grad_to_scalar_warning, test/test_torch.py::TestTorch::test_tensoriterator_output_setup, test/test_torch.py::TestTorch::test_terminate_handler_on_crash, test/test_torch.py::TestTorch::test_to, test/test_torch.py::TestTorch::test_to_with_tensor, test/test_torch.py::TestTorch::test_topk_neg_dim, test/test_torch.py::TestTorch::test_torch_from_file, test/test_torch.py::TestTorch::test_transpose_neg_dim, test/test_torch.py::TestTorch::test_type, test/test_torch.py::TestTorch::test_type_alias, test/test_torch.py::TestTorch::test_type_conversion_via_dtype_name, test/test_torch.py::TestTorch::test_typed_storage_deprecation_warning, test/test_torch.py::TestTorch::test_typed_storage_internal_no_warning, test/test_torch.py::TestTorch::test_unbind_neg_dim, test/test_torch.py::TestTorch::test_unflatten, test/test_torch.py::TestTorch::test_unfold_neg_dim, test/test_torch.py::TestTorch::test_unsqueeze_neg_dim, test/test_torch.py::TestTorch::test_upsample_nearest1d_meta, test/test_torch.py::TestTorch::test_upsample_nearest2d_meta, test/test_torch.py::TestTorch::test_var_neg_dim, test/test_torch.py::TestTorch::test_warn_types, test/test_torch.py::TestTorch::test_wildcard_import, test/test_torch.py::TestVitalSignsCudaCUDA::test_cuda_vitals_gpu_only_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test__local_scalar_dense_with_empty_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_cuda_errors_with_cpu_scalars_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_assertRaisesRegex_ignore_msg_non_native_device_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bfloat16_neg_abs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bool_tensor_value_change_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_add_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_addcdiv_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_addcmul_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_atan2_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_copy_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_dist_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_div_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_eq_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_fmod_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_ge_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_gt_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_le_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_lerp_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_lt_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_map2_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_map_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_fill_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_scatter_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_select_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_max_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_min_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_mul_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_ne_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_pow_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_remainder_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_sub_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_no_inf_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_no_inf_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_cuda_backward_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_euclidean_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_grad_p_lt_1_no_nan_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_large_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_non_contiguous_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_non_contiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_same_inputs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_check_tensor_all_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_check_tensor_internal_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_all_dtypes_and_devices_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_not_memory_dense_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_zero_stride_dim_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_complex_half_experimental_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_constants_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_conv_transposed_backward_agnostic_to_memory_format_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_conv_transposed_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_all_dtypes_and_devices_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_math_view_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_mem_overlap_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cov_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cpp_warnings_have_python_context_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummax_cummin_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummax_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummin_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumprod_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_64bit_indexing_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_outer_dim_64bit_indexing_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_replication_pad2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_device_guard_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dim_function_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_discontiguous_out_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dist_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dtypetensor_warnings_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_expected_failure_xla_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_no_zero_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_no_zero_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gather_backward_deterministic_path_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gather_backward_one_dim_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scale_will_not_overflow_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaler_deprecated_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaler_pass_itself_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_accumulation_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_clipping_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_clipping_separate_unscale_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_multiple_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_penalty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_state_dict_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_sparse_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_update_scale_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_type_promotion_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_hook_remove_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_add_large_inputs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_add_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_copy_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_fill_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_put_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_int64_upsample3d_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_invalid_shapes_grid_sampler_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_is_set_to_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_is_signed_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e4m3fn, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e4m3fnuz, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e5m2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e5m2fnuz, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_large_cumprod_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_large_cumsum_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_logcumsumexp_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lognormal_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_bool_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bfloat16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bfloat16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bool_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bool_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex128_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex128_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float32_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float32_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int32_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int32_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int8_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int8_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_uint8_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_uint8_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_bool_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_inplace_noncontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_large_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_clone_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_consistency_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_cpu_and_cuda_ops_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_empty_like_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_factory_like_functions_preserve_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_operators_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_preserved_after_permute_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_propagation_rules_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_to_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_type_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_type_shortcuts_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_module_share_memory_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_device_constrain_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_empty_w_replacement_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_empty_wo_replacement_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_gpu_device_constrain_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_rng_state_advance_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_narrow_copy_non_contiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_narrow_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_no_nondeterministic_alert_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_no_nondeterministic_alert_interpolate_trilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveAvgPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveAvgPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveMaxPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AvgPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_CTCLoss_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_EmbeddingBag_max_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_FractionalMaxPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_FractionalMaxPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_NLLLoss_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReflectionPad1d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReflectionPad3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad1d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_bincount_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_grid_sample_2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_grid_sample_3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_bicubic_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_linear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_trilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_median_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_put_accumulate_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_put_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_qint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_qint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint2x4, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint4x2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nullary_op_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pairwise_distance_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pdist_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pdist_norm_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pickle_gradscaler_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pin_memory_from_constructor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_reduced_type_float_copy_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_reduced_type_float_copy_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_repeat_interleave_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scalar_check_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_bool_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_non_unique_index_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_one_dim_deterministic_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_to_large_input_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_bool_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_multiply_unsupported_dtypes_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_multiply_unsupported_dtypes_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_to_large_input_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_zero_size_index_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_serialization_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_default_tensor_type_warnings_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_shift_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_skip_xla_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_all_devices_non_blocking_False_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_all_devices_non_blocking_True_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_qint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_qint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_quint4x2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_quint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_strides_propagation_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_sync_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_set_errors_multigpu_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_shape_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_type_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_ternary_op_mem_overlap_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_untyped_storage_meta_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_warn_always_caught_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_where_scalar_handcrafted_values_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_advancedindex_mixed_cpu_devices_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_advancedindex_mixed_devices_error_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_float32, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_float64, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_int64, test/test_torch.py::TestDevicePrecisionCUDA::test_copy_broadcast_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_copy_noncontig_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_cuda_device_idx_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_device_serialization_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float16, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float32, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float64, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int16, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int32, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int64, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int8, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_uint8, test/test_torch.py::TestDevicePrecisionCUDA::test_index_add_bfloat16_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_multidevice_serialization_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_type_conversions_same_device_cuda
2025-12-04T15:21:47.2765405Z 
2025-12-04T15:21:47.2765777Z Finished test_torch 1/1 ... [2025-12-04 15:21:47.213468][22179.222691672], took 1.40min
2025-12-04T15:21:47.2766678Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_torch/test_torch-6322eeaa434bd119.xml
2025-12-04T15:21:47.3459487Z Running xpu/test_gemm 1/1 ... [2025-12-04 15:21:47.345546][22179.354766493]
2025-12-04T15:21:47.3459925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:47.3462635Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:47.345915]
2025-12-04T15:21:51.4678816Z 
2025-12-04T15:21:51.4679728Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_db81f0dcd896f79f_.log
2025-12-04T15:21:51.4680422Z Running 0 items in this shard:
2025-12-04T15:21:51.4680620Z 
2025-12-04T15:21:51.4680884Z Finished xpu/test_gemm 1/1 ... [2025-12-04 15:21:51.467487][22183.476711952], took 0.07min
2025-12-04T15:21:51.4917924Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-6cf9ed264c8fa189.xml
2025-12-04T15:21:51.5354491Z Running test_binary_ufuncs 1/1 ... [2025-12-04 15:21:51.534999][22183.544224671]
2025-12-04T15:21:51.5355060Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:51.5357798Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_binary_ufuncs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:51.535354]
2025-12-04T15:26:18.5305565Z 
2025-12-04T15:26:18.5309264Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_d43f59e69a692663_.log
2025-12-04T15:26:19.0818977Z Running 12917 items in this shard: test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_broadcast_empty_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_with_tail_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addcmul_scalars_as_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addsub_half_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_edgecases_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_scalar_device_unspecified_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_ops_with_scalars_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bool_tensor_comparison_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cmul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpu_tensor_pow_cuda_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cremainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_binary_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_inplace_error_msg_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_csub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cuda_tensor_pow_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cumulative_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_script_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divmul_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_exceptions_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_idiv_and_ifloordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_division_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_dunders_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_and_float_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_tensor_pow_neg_ints_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_with_nontrivial_alignment_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_long_tensor_pow_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_forward_ad_float32_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_chalf_tensor_and_cpu_scalar_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_bfloat16_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_out_resize_warning_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_inplace_resizing_exception_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_base_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_overloads_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_overflow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rpow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_typing_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_tensor_pow_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___radd___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rand___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rdiv___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmod___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmul___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___ror___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rpow___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rsub___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rxor___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_return_by_ref_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_max_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_min_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_h_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_he_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_laguerre_polynomial_l_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_legendre_polynomial_p_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_bfloat16_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_gradients_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_uint8
2025-12-04T15:26:19.6614737Z 
2025-12-04T15:26:19.6615044Z Finished test_binary_ufuncs 1/1 ... [2025-12-04 15:26:18.553447][22450.562663354], took 4.45min
2025-12-04T15:26:19.6616057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-510898c7a9dfb9c9.xml
2025-12-04T15:26:19.6616978Z Running test_modules 2/4 ... [2025-12-04 15:26:18.966945][22450.976162666]
2025-12-04T15:26:19.6617398Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:26:19.6618394Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:26:18.967409]
2025-12-04T15:34:16.1094400Z 
2025-12-04T15:34:16.1095638Z test_modules 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_2.4_d8a3e6157b79afbb_.log
2025-12-04T15:34:16.1445163Z Running 909 items in this shard: test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiheadAttention_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_RNN_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoder_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm2d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CTCLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConstantPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CrossEntropyLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRUCell_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRU_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GroupNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardswish_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_HingeEmbeddingLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_L1Loss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LayerNorm_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LayerNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Linear_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LocalResponseNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSigmoid_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PoissonNLLLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNNCell_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNN_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SiLU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SoftMarginLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmin_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softplus_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softshrink_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanh_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Threshold_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerDecoderLayer_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoderLayer_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCELoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GELU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardswish_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardtanh_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTMCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LeakyReLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LeakyReLU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSoftmax_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MSELoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiMarginLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RMSNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SiLU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanh_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_True_set_grad_True_cuda_float32
2025-12-04T15:34:16.1782824Z 
2025-12-04T15:34:16.1783105Z Finished test_modules 2/4 ... [2025-12-04 15:34:16.110571][22928.119794958], took 7.95min
2025-12-04T15:34:16.1784154Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_modules/test_modules-1ceed37f0450876d.xml
2025-12-04T15:34:16.2354403Z Running torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 15:34:16.235040][22928.244262048]
2025-12-04T15:34:16.2355095Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:16.2358071Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/linalg/test_linalg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:16.235398]
2025-12-04T15:34:27.8704153Z 
2025-12-04T15:34:27.8705516Z torch_np/numpy_tests/linalg/test_linalg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_3f3446ecd43fd597_.log
2025-12-04T15:34:27.8806186Z Running 268 items in this shard: test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size_k, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_identity, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_basic_nonsvd, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_nan, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_stacked_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_zero, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_0_n_rhs_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_2_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_future_rcond, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_incompatible_dims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNorm_NonSystematic::test_intmin, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_matrix_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_reduced_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_symmetric_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_all_but_economic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_raw, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_3_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_byteorder_check, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_generalized_raise_multiloop, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_sdot_bug_8577, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_xerbla_override, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_dynamic_programming_optimization, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_three_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_two_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_logic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_optimization_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_three_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_too_few_input_arrays, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_two_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_and_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_-2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_result, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a0_axes0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a1_axes1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_dot, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_geqrf_lwork_smoketest, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_unsupported_commontype
2025-12-04T15:34:27.8905058Z 
2025-12-04T15:34:27.8905486Z Finished torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 15:34:27.870514][22939.879736442], took 0.19min
2025-12-04T15:34:27.8958586Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-320a7bc7a2da135c.xml
2025-12-04T15:34:27.9893105Z Running torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 15:34:27.988940][22939.998162484]
2025-12-04T15:34:27.9893933Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:27.9896320Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:27.989251]
2025-12-04T15:34:32.0117001Z 
2025-12-04T15:34:32.0118044Z torch_np/numpy_tests/core/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_bb9947961cd52757_.log
2025-12-04T15:34:32.0164835Z Running 102 items in this shard: test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_equivalent_dtype_hashing, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_invalid_types, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bool, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bytes0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Datetime64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Object0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Str0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Timedelta64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Void0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation3, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_equality, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_non_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_DType11, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_bool__10, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex128_4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex64_3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float16_0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float32_1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float64_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int16_7, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int32_8, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int64_9, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int8_6, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_uint8_5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_complex64_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float16_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float32_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_4294967295_expected1_expected_weak1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_65535_expected0_expected_weak0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes7_expected7, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes8_expected8, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes9_expected9, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_18446744073709551616, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_200, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_4294967296, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_9223372036854775808, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_dtypes_are_true, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_keyword_argument, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_recursion, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_simple, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_?, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_B, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_D, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_F, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_b, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_d, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_e, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_f, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_h, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_i, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_l, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_scalar, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_3
2025-12-04T15:34:32.0209425Z 
2025-12-04T15:34:32.0209903Z Finished torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 15:34:32.011476][22944.020698739], took 0.07min
2025-12-04T15:34:32.0368067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-9c6a851d43187f63.xml
2025-12-04T15:34:32.0748314Z Running lazy/test_debug_util 1/1 ... [2025-12-04 15:34:32.074465][22944.083689789]
2025-12-04T15:34:32.0748951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:32.0752074Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_debug_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:32.074834]
2025-12-04T15:34:35.8464023Z 
2025-12-04T15:34:35.8464997Z lazy/test_debug_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_debug_util_1.1_6159721dd42cd649_.log
2025-12-04T15:34:35.8466112Z Running 1 items in this shard: test/lazy/test_debug_util.py::DebugUtilTest::test_get_python_frames
2025-12-04T15:34:35.8466585Z 
2025-12-04T15:34:35.8467072Z Finished lazy/test_debug_util 1/1 ... [2025-12-04 15:34:35.846085][22947.855309677], took 0.06min
2025-12-04T15:34:35.8712749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-612fe6974f2e86fb.xml
2025-12-04T15:34:35.9233223Z Running nn/test_load_state_dict 1/1 ... [2025-12-04 15:34:35.922971][22947.932196328]
2025-12-04T15:34:35.9234066Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:35.9236376Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_load_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:35.923285]
2025-12-04T15:34:40.2456512Z 
2025-12-04T15:34:40.2458035Z nn/test_load_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_load_state_dict_1.1_1f7336ad32e96ae1_.log
2025-12-04T15:34:40.2470892Z Running 29 items in this shard: test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_ref_cycle_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_False, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_True
2025-12-04T15:34:40.2482939Z 
2025-12-04T15:34:40.2483376Z Finished nn/test_load_state_dict 1/1 ... [2025-12-04 15:34:40.245313][22952.254538145], took 0.07min
2025-12-04T15:34:40.2706517Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-573eaa6de6818c33.xml
2025-12-04T15:34:40.3058571Z Running test_shape_ops 1/1 ... [2025-12-04 15:34:40.305419][22952.314643888]
2025-12-04T15:34:40.3059323Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:40.3060775Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_shape_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:40.305721]
2025-12-04T15:34:45.5314239Z 
2025-12-04T15:34:45.5315260Z test_shape_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_shape_ops_1.1_17556160abffc005_.log
2025-12-04T15:34:45.5344528Z Running 99 items in this shard: test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_propagates_nans_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_raises_arg_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_multidim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_large_tensor_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint2x4, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint4x2, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_astuple_out_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_discontiguous_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_no_warning_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_non_diff_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_rot90_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_tolist_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_unbind_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_all_devices_and_dtypes_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_backward_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_scalars_cuda
2025-12-04T15:34:45.5372859Z 
2025-12-04T15:34:45.5373109Z Finished test_shape_ops 1/1 ... [2025-12-04 15:34:45.531050][22957.540274024], took 0.09min
2025-12-04T15:34:45.5567659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_shape_ops/test_shape_ops-8ae5e584fb53bb5e.xml
2025-12-04T15:34:45.6071124Z Running profiler/test_memory_profiler 1/1 ... [2025-12-04 15:34:45.606777][22957.616001326]
2025-12-04T15:34:45.6071878Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:45.6075659Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_memory_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:45.607097]
2025-12-04T15:34:53.3864828Z 
2025-12-04T15:34:53.3865910Z profiler/test_memory_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_memory_profiler_1.1_f20e3ab107ff598c_.log
2025-12-04T15:34:53.3880604Z Running 33 items in this shard: test/profiler/test_memory_profiler.py::TestMemoryProfiler::test_config_check, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module_and_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer_set_to_none, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_low_level, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_complicated, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_non_op_allocations, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_inplace, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_stacked, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_with_annotations, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_tensorlist, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_lazy, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_lazily_initialized, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_manual_optimizer_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_memory_timeline, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients_set_to_none, test/profiler/test_memory_profiler.py::TestMemoryProfilerTimelineCUDA::test_memory_timeline_no_id_cuda
2025-12-04T15:34:53.3894827Z 
2025-12-04T15:34:53.3895283Z Finished profiler/test_memory_profiler 1/1 ... [2025-12-04 15:34:53.386130][22965.395353946], took 0.13min
2025-12-04T15:34:53.4120710Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-419c9aea1e4e06f2.xml
2025-12-04T15:34:53.4851715Z Running test_indexing 1/1 ... [2025-12-04 15:34:53.484791][22965.494012062]
2025-12-04T15:34:53.4852429Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:53.4856762Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:53.485183]
2025-12-04T15:35:16.7497334Z 
2025-12-04T15:35:16.7498483Z test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_indexing_1.1_fbbd66d5cf2cd3ea_.log
2025-12-04T15:35:16.7558464Z Running 186 items in this shard: test/test_indexing.py::TestIndexingCUDA::test_advancedindex_big_cuda, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_basic_advanced_combined_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_tensor_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_cpu_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_cuda_broadcast_index_use_deterministic_algorithms_cuda, test/test_indexing.py::TestIndexingCUDA::test_ellipsis_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_bool_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_slice_cuda, test/test_indexing.py::TestIndexingCUDA::test_errors_index_copy_cuda, test/test_indexing.py::TestIndexingCUDA::test_gather_take_along_dim_cross_device_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_getitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_add_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_getitem_copy_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_ind_dtype_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_limits_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_duplicate_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_empty_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_expanded_values_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_large_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_non_contiguous_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_deterministic_with_optional_tensors_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_large_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_non_accumulate_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_scalar_with_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_setitem_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_int_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_broadcast_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_device_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_jit_indexing_cuda, test/test_indexing.py::TestIndexingCUDA::test_list_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_none_cuda, test/test_indexing.py::TestIndexingCUDA::test_out_of_bound_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_set_item_to_scalar_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_expansion_error_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_single_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_cuda, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_unravel_index_errors_cuda, test/test_indexing.py::TestIndexingCUDA::test_variable_slicing_cuda, test/test_indexing.py::TestIndexingCUDA::test_zero_dim_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_assignment_value_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_alldims_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_onedim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_twodim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_tensors_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_list_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_shape_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broadcast_subspace_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broaderrors_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_ellipsis_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_fancy_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_tuple_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_everything_returns_views_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_is_larger_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_no_floats_cuda, test/test_indexing.py::NumpyTestsCUDA::test_none_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_bool_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_int_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_trivial_fancy_out_of_bounds_cuda, test/test_indexing.py::NumpyTestsCUDA::test_truncate_leading_1s_cuda
2025-12-04T15:35:16.7617366Z 
2025-12-04T15:35:16.7617754Z Finished test_indexing 1/1 ... [2025-12-04 15:35:16.749698][22988.758921109], took 0.39min
2025-12-04T15:35:16.7756747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_indexing/test_indexing-bb3db4f55bab2e87.xml
2025-12-04T15:35:16.8707512Z Running test_type_info 1/1 ... [2025-12-04 15:35:16.870278][22988.879502193]
2025-12-04T15:35:16.8708197Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:16.8709979Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_info.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:16.870630]
2025-12-04T15:35:20.5418978Z 
2025-12-04T15:35:20.5419988Z test_type_info 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_info_1.1_02020d4e7679db8b_.log
2025-12-04T15:35:20.5422469Z Running 5 items in this shard: test/test_type_info.py::TestDTypeInfo::test_finfo, test/test_type_info.py::TestDTypeInfo::test_iinfo, test/test_type_info.py::TestDTypeInfo::test_invalid_input, test/test_type_info.py::TestDTypeInfo::test_to_complex, test/test_type_info.py::TestDTypeInfo::test_to_real
2025-12-04T15:35:20.5423794Z 
2025-12-04T15:35:20.5424081Z Finished test_type_info 1/1 ... [2025-12-04 15:35:20.541522][22992.550745742], took 0.06min
2025-12-04T15:35:20.5677951Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_info/test_type_info-3cbecfd6afe8711f.xml
2025-12-04T15:35:20.6050101Z Running functorch/test_aotdispatch 1/1 ... [2025-12-04 15:35:20.604648][22992.613872504]
2025-12-04T15:35:20.6050736Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:20.6054029Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aotdispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:20.605005]
2025-12-04T15:37:22.8417134Z 
2025-12-04T15:37:22.8420883Z functorch/test_aotdispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aotdispatch_1.1_73fa05bc552fde2d_.log
2025-12-04T15:37:22.8667981Z Running 537 items in this shard: test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_module, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_ban_dropout_mut_pre_dispatch, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_multiple_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_no_buffer_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_functionalized_rng_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_dupes_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_input_requiring_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_parameter_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_metadata_mutation_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_module_joint, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_multiple_outputs_require_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_buffer_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_inplace, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_linear, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_contiguous, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_conv_and_bn, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_composite_implicit, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_simple, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_view, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_1, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_2, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_outdtype, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_reshape, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_autograd_op, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond_nested, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_basic, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_pytrees_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_synthetic_bases_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_unbacked_arg, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_with_torch_cond, test/functorch/test_aotdispatch.py::TestPartitioning::test_autocast, test/functorch/test_aotdispatch.py::TestPartitioning::test_contiguous, test/functorch/test_aotdispatch.py::TestPartitioning::test_custom_partitioner_fn, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_getitem, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_generate_gives_inference_graph, test/functorch/test_aotdispatch.py::TestPartitioning::test_meta_tensor_inplace_op, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_raise_getitems, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_save_shape, test/functorch/test_aotdispatch.py::TestPartitioning::test_preserve_random, test/functorch/test_aotdispatch.py::TestPartitioning::test_quantize_activation_duplicate_nodes, test/functorch/test_aotdispatch.py::TestPartitioning::test_recompute_partitioning, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_incorrect_backward, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_inference, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad_views, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_simple, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_dynamic, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_fake_tensor_gm_raises, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace_from_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_test_subclasses_with_tensor_factories, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_flex_attn_noncontiguous_tangents, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_dense, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_tensor_tangent, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inductor_freezing_with_subclasses, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inference_python_dispatcher, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_layer_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_lift_fresh_copy_in_graph, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rms_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_all, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_donated, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_no_static, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_donated_buffers, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_params, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_recompile, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters_torture_case, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_tangent_type_coercion, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_wrong_guess_tangent_type, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_detach
2025-12-04T15:37:22.8911102Z 
2025-12-04T15:37:22.8911490Z Finished functorch/test_aotdispatch 1/1 ... [2025-12-04 15:37:22.842646][23114.85186094], took 2.04min
2025-12-04T15:37:22.8912790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-3265775c77799c99.xml
2025-12-04T15:37:22.9596882Z Running test_scatter_gather_ops 1/1 ... [2025-12-04 15:37:22.959339][23114.968561261]
2025-12-04T15:37:22.9597595Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:37:22.9600415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_scatter_gather_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:22.959670]
2025-12-04T15:37:42.5608776Z 
2025-12-04T15:37:42.5609667Z test_scatter_gather_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scatter_gather_ops_1.1_e624bed173f96ebf_.log
2025-12-04T15:37:42.5641742Z Running 76 items in this shard: test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_bool_cuda_bool, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_broadcasted_index_deterministic_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_mult_index_base_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex128, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_uint8
2025-12-04T15:37:42.5672660Z 
2025-12-04T15:37:42.5672985Z Finished test_scatter_gather_ops 1/1 ... [2025-12-04 15:37:42.560625][23134.569849357], took 0.33min
2025-12-04T15:37:42.5874007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5e8dbe55d5e60a97.xml
2025-12-04T15:37:42.6604747Z Running test_cuda_multigpu 1/1 ... [2025-12-04 15:37:42.660083][23134.669306695]
2025-12-04T15:37:42.6605415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:37:42.6609044Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_multigpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:42.660403]
2025-12-04T15:37:46.9826127Z 
2025-12-04T15:37:46.9826993Z test_cuda_multigpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_multigpu_1.1_134114cd1fad822a_.log
2025-12-04T15:37:46.9845596Z Running 61 items in this shard: test/test_cuda_multigpu.py::TestCudaMultiGPU::test_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_caching_pinned_memory_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cat_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_device_memory_allocated, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_init_race, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_memory_leak_detection, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_set_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_synchronize, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_current_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_default_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_elapsed_time, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_wait, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams_multi_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_get_set_rng_state_all, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_device_as_key, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_scale, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_load_nonexistent_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_mem_get_info, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap_dict, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_storage_clone, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_new, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_rng_state_offset, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_context, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_nogil, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streaming_backwards_device_transfer, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_eq, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_priority, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_tensor_device, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_empty_tensors, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_gather, test/test_cuda_multigpu.py::TestCudaComm::test_gather_dim, test/test_cuda_multigpu.py::TestCudaComm::test_gather_namedtuple, test/test_cuda_multigpu.py::TestCudaComm::test_gather_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_memory_format_scatter_gather, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_namedtuple
2025-12-04T15:37:46.9873949Z 
2025-12-04T15:37:46.9874221Z Finished test_cuda_multigpu 1/1 ... [2025-12-04 15:37:46.982300][23138.991524685], took 0.07min
2025-12-04T15:37:47.0091817Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-339f2b8a0ba2c562.xml
2025-12-04T15:37:47.0485589Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 15:37:47.048206][23139.057429969]
2025-12-04T15:37:47.0486254Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:37:47.0489571Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:47.048525]
2025-12-04T15:37:51.0709768Z 
2025-12-04T15:37:51.0710858Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a7d224f05328be14_.log
2025-12-04T15:37:51.0729613Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex
2025-12-04T15:37:51.0747699Z 
2025-12-04T15:37:51.0748089Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 15:37:51.070659][23143.079884086], took 0.07min
2025-12-04T15:37:51.0977333Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-7a9eb44e36e96ef2.xml
2025-12-04T15:37:51.1319042Z Running test_jit_autocast 1/1 ... [2025-12-04 15:37:51.131567][23143.140792043]
2025-12-04T15:37:51.1319692Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:37:51.1322625Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:51.131869]
2025-12-04T15:38:17.7185824Z 
2025-12-04T15:38:17.7186574Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_449f99b0d0d7aa89_.log
2025-12-04T15:38:17.7202638Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing
2025-12-04T15:38:17.7218475Z 
2025-12-04T15:38:17.7218742Z Finished test_jit_autocast 1/1 ... [2025-12-04 15:38:17.718298][23169.727522524], took 0.44min
2025-12-04T15:38:17.7458545Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-8a1338a601c4ef0b.xml
2025-12-04T15:38:17.8683418Z Running test_xnnpack_integration 1/1 ... [2025-12-04 15:38:17.867987][23169.877209737]
2025-12-04T15:38:17.8683908Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:38:17.8686858Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:17.868302]
2025-12-04T15:38:29.6903009Z 
2025-12-04T15:38:29.6904032Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_ef1a45d9c52ae3ce_.log
2025-12-04T15:38:29.6908794Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc
2025-12-04T15:38:29.6913188Z 
2025-12-04T15:38:29.6913499Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 15:38:29.689941][23181.699165299], took 0.20min
2025-12-04T15:38:29.7172629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-d08ca7b1f6355251.xml
2025-12-04T15:38:29.7982270Z Running nn/test_init 1/1 ... [2025-12-04 15:38:29.797766][23181.80698916]
2025-12-04T15:38:29.7982715Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:38:29.7984974Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:29.798094]
2025-12-04T15:38:37.2798835Z 
2025-12-04T15:38:37.2799704Z nn/test_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_init_1.1_414026fa8e0e69bb_.log
2025-12-04T15:38:37.2808642Z Running 30 items in this shard: test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu, test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu_only_accepts_numbers, test/nn/test_init.py::TestNNInit::test_calculate_gain_linear, test/nn/test_init.py::TestNNInit::test_calculate_gain_nonlinear, test/nn/test_init.py::TestNNInit::test_calculate_gain_only_accepts_valid_nonlinearities, test/nn/test_init.py::TestNNInit::test_constant, test/nn/test_init.py::TestNNInit::test_deprecation, test/nn/test_init.py::TestNNInit::test_dirac_identity, test/nn/test_init.py::TestNNInit::test_dirac_only_works_on_3_4_5d_inputs, test/nn/test_init.py::TestNNInit::test_dirac_properties, test/nn/test_init.py::TestNNInit::test_eye, test/nn/test_init.py::TestNNInit::test_eye_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_kaiming_normal, test/nn/test_init.py::TestNNInit::test_kaiming_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_normal_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_kaiming_uniform, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_normal, test/nn/test_init.py::TestNNInit::test_ones_and_zeros, test/nn/test_init.py::TestNNInit::test_orthogonal, test/nn/test_init.py::TestNNInit::test_sparse_default_std, test/nn/test_init.py::TestNNInit::test_sparse_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_trunc_normal, test/nn/test_init.py::TestNNInit::test_trunc_normal_generator, test/nn/test_init.py::TestNNInit::test_uniform, test/nn/test_init.py::TestNNInit::test_xavier_normal, test/nn/test_init.py::TestNNInit::test_xavier_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_xavier_uniform, test/nn/test_init.py::TestNNInit::test_xavier_uniform_errors_on_inputs_smaller_than_2d
2025-12-04T15:38:37.2816625Z 
2025-12-04T15:38:37.2816861Z Finished nn/test_init 1/1 ... [2025-12-04 15:38:37.279566][23189.288790762], took 0.12min
2025-12-04T15:38:37.3068310Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_init/nn.test_init-bb3f84e769cc626f.xml
2025-12-04T15:38:37.3809953Z Running test_mobile_optimizer 1/1 ... [2025-12-04 15:38:37.380572][23189.389795872]
2025-12-04T15:38:37.3810595Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:38:37.3813352Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mobile_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:37.380954]
2025-12-04T15:38:43.1062051Z 
2025-12-04T15:38:43.1063005Z test_mobile_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mobile_optimizer_1.1_2406b12c26273884_.log
2025-12-04T15:38:43.1066041Z Running 7 items in this shard: test/test_mobile_optimizer.py::TestOptimizer::test_clone_module_with_class, test/test_mobile_optimizer.py::TestOptimizer::test_generate_mobile_module_lints, test/test_mobile_optimizer.py::TestOptimizer::test_hoist_conv_packed_params, test/test_mobile_optimizer.py::TestOptimizer::test_mobilenet_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_preserve_bundled_inputs_methods, test/test_mobile_optimizer.py::TestOptimizer::test_quantized_conv_no_asan_failures
2025-12-04T15:38:43.1068597Z 
2025-12-04T15:38:43.1334824Z Finished test_mobile_optimizer 1/1 ... [2025-12-04 15:38:43.105831][23195.115055521], took 0.10min
2025-12-04T15:38:43.1336921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-081f0752aeda15ae.xml
2025-12-04T15:38:43.1674996Z Running test_type_promotion 1/1 ... [2025-12-04 15:38:43.167094][23195.176319636]
2025-12-04T15:38:43.1675598Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:38:43.1677474Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_promotion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:43.167401]
2025-12-04T15:38:56.7062658Z 
2025-12-04T15:38:56.7064771Z test_type_promotion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_promotion_1.1_a64bbb5536dae6ab_.log
2025-12-04T15:38:56.7244310Z Running 423 items in this shard: test/test_type_promotion.py::TestTypePromotionCUDA::test_add_wrapped_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alpha_mismatch_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alternate_result_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_bfloat16_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_booleans_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_can_cast_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_out_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_comparison_ops_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_assertraises_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_scalar_mult_tensor_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_computation_ignores_out_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_create_bool_tensors_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_float_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_from_issue_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_fail_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_inplace_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_to_float_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_lt_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_many_promotions_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_mixed_type_backward_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_non_promoting_ops_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_self_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_types_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_tensor_vs_scalar_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_add_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_mul_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_sub_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_ternary_out_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_transpose_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unsigned_cuda
2025-12-04T15:38:56.7421474Z 
2025-12-04T15:38:56.7421758Z Finished test_type_promotion 1/1 ... [2025-12-04 15:38:56.706711][23208.715935289], took 0.23min
2025-12-04T15:38:56.7422772Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_promotion/test_type_promotion-3f39f26aca555a70.xml
2025-12-04T15:38:58.2318947Z Uploading artifacts took 1.41 seconds
2025-12-04T15:38:58.2322441Z Running test_reductions 1/1 ... [2025-12-04 15:38:58.231933][23210.241156073]
2025-12-04T15:38:58.2322921Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:38:58.2326710Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:58.232331]
2025-12-04T15:41:45.1273942Z 
2025-12-04T15:41:45.1274723Z test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_reductions_1.1_4c27d813839f98a0_.log
2025-12-04T15:41:45.3169726Z Running 4759 items in this shard: test/test_reductions.py::TestReductionsCUDA::test_accreal_type_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_all_any_with_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_issue117215_cuda, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_amin_amax_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_axis_with_dim_one_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_large_axis_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_bincount_cuda, test/test_reductions.py::TestReductionsCUDA::test_bucketization_cuda, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_cumprod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_cumsum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_less_than_64_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histogram_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogram_error_handling_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogramdd_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_integral_promotion_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_max_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mean_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_int_with_optdtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_corner_cases_cuda, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_max_nan_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_minmax_illegal_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_boolean_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_device_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_numpy_named_args_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_bool_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_quantile_backward_cuda, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_quantile_error_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduce_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_empty_any_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_split_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_repeated_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_as_dim_argument_cuda, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_dim_compiled_mode_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_cpu_device_mismatch_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_reduction_uint8_overflow_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_out_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_parallel_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_argmax_argmix_kthvalue_dim_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_reduce_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_large_input_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability2_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float64
﻿2025-12-04T15:41:45.5005914Z 
2025-12-04T15:41:45.5006190Z Finished test_reductions 1/1 ... [2025-12-04 15:41:45.134949][23377.144173391], took 2.78min
2025-12-04T15:41:45.5007193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_reductions/test_reductions-31a848701d5079bd.xml
2025-12-04T15:41:45.5008289Z Running test_autoload_disable 1/1 ... [2025-12-04 15:41:45.324617][23377.333838102]
2025-12-04T15:41:45.6606079Z Processing /var/lib/jenkins/workspace/test/cpp_extensions
2025-12-04T15:41:48.9058521Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T15:41:48.9079365Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension
2025-12-04T15:43:15.1437700Z   Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done
2025-12-04T15:43:15.1563988Z [?25h  Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13199657 sha256=7c07cad18ea0e6d31f276459cef3d32a7a1ce159eb926509a0f6578be4510701
2025-12-04T15:43:15.1565410Z   Stored in directory: /tmp/pip-ephem-wheel-cache-z0r_xujv/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1
2025-12-04T15:43:15.1582836Z Successfully built torch_test_cpp_extension
2025-12-04T15:43:15.5268441Z Installing collected packages: torch_test_cpp_extension
2025-12-04T15:43:15.7424292Z Successfully installed torch_test_cpp_extension-0.0.0
2025-12-04T15:43:18.3791860Z 
2025-12-04T15:43:18.3792117Z Running tests...
2025-12-04T15:43:18.3792408Z ----------------------------------------------------------------------
2025-12-04T15:43:18.7253808Z .
2025-12-04T15:43:18.7254224Z ----------------------------------------------------------------------
2025-12-04T15:43:18.7254608Z Ran 1 test in 0.346s
2025-12-04T15:43:18.7254756Z 
2025-12-04T15:43:18.7254839Z OK
2025-12-04T15:43:18.7254942Z 
2025-12-04T15:43:18.7255043Z Generating XML reports...
2025-12-04T15:43:19.4411623Z Finished test_autoload_disable 1/1 ... [2025-12-04 15:43:19.440603][23471.449818604], took 1.57min
2025-12-04T15:43:19.4690874Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204154318.xml
2025-12-04T15:43:23.3789543Z Running test batch 'tests to run' cost 22539.08 seconds
2025-12-04T15:43:23.3803411Z Emitting td_test_failure_stats_v2
2025-12-04T15:43:23.3806508Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f7712824d12711f081bf0242ac110002
2025-12-04T15:43:23.5211262Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f7712824d12711f081bf0242ac110002 
2025-12-04T15:43:23.5228134Z Emitting td_test_failure_stats_v2
2025-12-04T15:43:23.5229613Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f786de9ed12711f081bf0242ac110002
2025-12-04T15:43:23.5535525Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f786de9ed12711f081bf0242ac110002 
2025-12-04T15:43:23.5545696Z Emitting td_test_failure_stats_v2
2025-12-04T15:43:23.5547934Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f78bb8d8d12711f081bf0242ac110002
2025-12-04T15:43:23.5887436Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f78bb8d8d12711f081bf0242ac110002 
2025-12-04T15:43:23.5888474Z inductor/test_fp8 1/1 failed!
2025-12-04T15:43:23.5888761Z test_cuda 1/1 failed!
2025-12-04T15:43:23.5889083Z test_sparse 1/1 failed!
2025-12-04T15:43:24.3319618Z 
2025-12-04T15:43:24.3319929Z real	375m45.282s
2025-12-04T15:43:24.3320205Z user	376m19.018s
2025-12-04T15:43:24.3320414Z sys	36m48.682s
2025-12-04T15:43:24.3320641Z + sccache_epilogue
2025-12-04T15:43:24.3320917Z + echo '::group::Sccache Compilation Log'
2025-12-04T15:43:24.3322037Z ##[group]Sccache Compilation Log
2025-12-04T15:43:24.3322382Z + echo '=================== sccache compilation log ==================='
2025-12-04T15:43:24.3322782Z =================== sccache compilation log ===================
2025-12-04T15:43:24.3323407Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log
2025-12-04T15:43:24.3471392Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
2025-12-04T15:43:24.3472082Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2025-12-04T15:43:24.3472559Z + sccache --show-stats
2025-12-04T15:43:24.3508814Z Compile requests                   3479
2025-12-04T15:43:24.3509624Z Compile requests executed           347
2025-12-04T15:43:24.3509974Z Cache hits                          166
2025-12-04T15:43:24.3510266Z Cache hits (C/C++)                  166
2025-12-04T15:43:24.3510559Z Cache misses                        181
2025-12-04T15:43:24.3510856Z Cache misses (C/C++)                181
2025-12-04T15:43:24.3511162Z Cache hits rate                   47.84 %
2025-12-04T15:43:24.3511466Z Cache hits rate (C/C++)           47.84 %
2025-12-04T15:43:24.3511777Z Cache timeouts                        0
2025-12-04T15:43:24.3512065Z Cache read errors                     0
2025-12-04T15:43:24.3512370Z Forced recaches                       0
2025-12-04T15:43:24.3512659Z Cache write errors                    0
2025-12-04T15:43:24.3512942Z Cache errors                          0
2025-12-04T15:43:24.3513232Z Compilations                        181
2025-12-04T15:43:24.3513557Z Compilation failures                  0
2025-12-04T15:43:24.3513907Z Non-cacheable compilations            0
2025-12-04T15:43:24.3514220Z Non-cacheable calls                 173
2025-12-04T15:43:24.3514558Z Non-compilation calls              2959
2025-12-04T15:43:24.3514970Z Unsupported compiler calls            0
2025-12-04T15:43:24.3515415Z Average cache write               0.049 s
2025-12-04T15:43:24.3515780Z Average compiler                  5.973 s
2025-12-04T15:43:24.3516115Z Average cache read hit            0.031 s
2025-12-04T15:43:24.3516428Z Failed distributed compilations       0
2025-12-04T15:43:24.3516649Z 
2025-12-04T15:43:24.3516748Z Non-cacheable reasons:
2025-12-04T15:43:24.3517024Z unknown source language             138
2025-12-04T15:43:24.3517338Z -E                                   35
2025-12-04T15:43:24.3517562Z 
2025-12-04T15:43:24.3517805Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T15:43:24.3518273Z Version (client)                0.10.0
2025-12-04T15:43:24.3518599Z + sccache --stop-server
2025-12-04T15:43:24.3540038Z Stopping sccache server...
2025-12-04T15:43:24.3543747Z Compile requests                   3479
2025-12-04T15:43:24.3544097Z Compile requests executed           347
2025-12-04T15:43:24.3544418Z Cache hits                          166
2025-12-04T15:43:24.3544720Z Cache hits (C/C++)                  166
2025-12-04T15:43:24.3545016Z Cache misses                        181
2025-12-04T15:43:24.3545316Z Cache misses (C/C++)                181
2025-12-04T15:43:24.3545611Z Cache hits rate                   47.84 %
2025-12-04T15:43:24.3545921Z Cache hits rate (C/C++)           47.84 %
2025-12-04T15:43:24.3546233Z Cache timeouts                        0
2025-12-04T15:43:24.3546518Z Cache read errors                     0
2025-12-04T15:43:24.3546813Z Forced recaches                       0
2025-12-04T15:43:24.3547105Z Cache write errors                    0
2025-12-04T15:43:24.3547389Z Cache errors                          0
2025-12-04T15:43:24.3547680Z Compilations                        181
2025-12-04T15:43:24.3547986Z Compilation failures                  0
2025-12-04T15:43:24.3548370Z Non-cacheable compilations            0
2025-12-04T15:43:24.3548680Z Non-cacheable calls                 173
2025-12-04T15:43:24.3548982Z Non-compilation calls              2959
2025-12-04T15:43:24.3549457Z Unsupported compiler calls            0
2025-12-04T15:43:24.3549833Z Average cache write               0.049 s
2025-12-04T15:43:24.3550205Z Average compiler                  5.973 s
2025-12-04T15:43:24.3550565Z Average cache read hit            0.031 s
2025-12-04T15:43:24.3550888Z Failed distributed compilations       0
2025-12-04T15:43:24.3551219Z 
2025-12-04T15:43:24.3551314Z Non-cacheable reasons:
2025-12-04T15:43:24.3551574Z unknown source language             138
2025-12-04T15:43:24.3551866Z -E                                   35
2025-12-04T15:43:24.3552070Z 
2025-12-04T15:43:24.3552303Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T15:43:24.3552748Z Version (client)                0.10.0
2025-12-04T15:43:24.3561261Z + echo ::endgroup::
2025-12-04T15:43:24.3561900Z ##[endgroup]
2025-12-04T15:43:24.3562122Z + cleanup_workspace
2025-12-04T15:43:24.3562617Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.'
2025-12-04T15:43:24.3563402Z sudo may print the following warning message that can be ignored. The chown command will still run.
2025-12-04T15:43:24.3564030Z + echo '    sudo: setrlimit(RLIMIT_STACK): Operation not permitted'
2025-12-04T15:43:24.3564495Z     sudo: setrlimit(RLIMIT_STACK): Operation not permitted
2025-12-04T15:43:24.3565079Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42'
2025-12-04T15:43:24.3565738Z For more details refer to https://github.com/sudo-project/sudo/issues/42
2025-12-04T15:43:24.3566294Z + sudo chown -R 1000 /var/lib/jenkins/workspace
2025-12-04T15:43:25.4614688Z ##[error]Process completed with exit code 1.
2025-12-04T15:43:25.4687171Z Prepare all required actions
2025-12-04T15:43:25.4687540Z Getting action download info
2025-12-04T15:43:25.6354105Z ##[group]Run ./.github/actions/pytest-cache-upload
2025-12-04T15:43:25.6354433Z with:
2025-12-04T15:43:25.6354637Z   cache_dir: .pytest_cache
2025-12-04T15:43:25.6354889Z   shard: 2
2025-12-04T15:43:25.6355121Z   sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T15:43:25.6355443Z   test_config: default
2025-12-04T15:43:25.6355828Z   job_identifier: periodic_linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T15:43:25.6356292Z env:
2025-12-04T15:43:25.6356496Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:25.6356757Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:25.6357056Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:25.6357611Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:25.6358114Z ##[endgroup]
2025-12-04T15:43:25.6391618Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T15:43:25.6391937Z with:
2025-12-04T15:43:25.6392136Z   shell: bash
2025-12-04T15:43:25.6392341Z   timeout_minutes: 5
2025-12-04T15:43:25.6392579Z   max_attempts: 5
2025-12-04T15:43:25.6392804Z   retry_wait_seconds: 30
2025-12-04T15:43:25.6393126Z   command: set -eu
python3 -m pip install boto3==1.35.42

2025-12-04T15:43:25.6393512Z   polling_interval_seconds: 1
2025-12-04T15:43:25.6393783Z   warning_on_retry: true
2025-12-04T15:43:25.6394032Z   continue_on_error: false
2025-12-04T15:43:25.6394277Z env:
2025-12-04T15:43:25.6394480Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:25.6394727Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:25.6395030Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:25.6395597Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:25.6396098Z ##[endgroup]
2025-12-04T15:43:26.1264914Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T15:43:27.4035499Z Collecting boto3==1.35.42
2025-12-04T15:43:27.4446770Z   Downloading boto3-1.35.42-py3-none-any.whl (139 kB)
2025-12-04T15:43:27.4607346Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0)
2025-12-04T15:43:28.8282527Z Collecting botocore<1.36.0,>=1.35.42
2025-12-04T15:43:28.8320434Z   Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB)
2025-12-04T15:43:29.0251330Z Collecting s3transfer<0.11.0,>=0.10.0
2025-12-04T15:43:29.0288694Z   Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB)
2025-12-04T15:43:29.0382979Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1)
2025-12-04T15:43:29.0392177Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10)
2025-12-04T15:43:29.2523763Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0)
2025-12-04T15:43:29.3442818Z Installing collected packages: botocore, s3transfer, boto3
2025-12-04T15:43:29.9688403Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4
2025-12-04T15:43:30.7218914Z Command completed after 1 attempt(s).
2025-12-04T15:43:30.7294938Z ##[group]Run python3 .github/scripts/pytest_cache.py \
2025-12-04T15:43:30.7306934Z [36;1mpython3 .github/scripts/pytest_cache.py \[0m
2025-12-04T15:43:30.7307290Z [36;1m  --upload \[0m
2025-12-04T15:43:30.7307584Z [36;1m  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \[0m
2025-12-04T15:43:30.7308288Z [36;1m  --pr_identifier "$GITHUB_REF" \[0m
2025-12-04T15:43:30.7308624Z [36;1m  --job_identifier "$JOB_IDENTIFIER" \[0m
2025-12-04T15:43:30.7308936Z [36;1m  --sha "$SHA" \[0m
2025-12-04T15:43:30.7309208Z [36;1m  --test_config "$TEST_CONFIG" \[0m
2025-12-04T15:43:30.7309508Z [36;1m  --shard "$SHARD" \[0m
2025-12-04T15:43:30.7310001Z [36;1m  --repo "$REPO" \[0m
2025-12-04T15:43:30.7310291Z [36;1m  --temp_dir "$RUNNER_TEMP" \[0m
2025-12-04T15:43:30.7325283Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:30.7325655Z env:
2025-12-04T15:43:30.7325862Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:30.7326117Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:30.7326425Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:30.7326988Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:30.7327500Z   CACHE_DIR: .pytest_cache
2025-12-04T15:43:30.7327895Z   JOB_IDENTIFIER: periodic_linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck
2025-12-04T15:43:30.7328381Z   SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T15:43:30.7328708Z   TEST_CONFIG: default
2025-12-04T15:43:30.7328931Z   SHARD: 2
2025-12-04T15:43:30.7329139Z   REPO: pytorch/pytorch
2025-12-04T15:43:30.7329381Z ##[endgroup]
2025-12-04T15:43:31.1444822Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013`
2025-12-04T15:43:31.1446948Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='periodic_linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='default', shard='2', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None)
2025-12-04T15:43:31.1449083Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache
2025-12-04T15:43:31.1450427Z      to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_8-py3-gcc11-slow-gradcheck/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2
2025-12-04T15:43:31.1452575Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_8-py3-gcc11-slow-gradcheck/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip
2025-12-04T15:43:31.1454616Z        to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_8-py3-gcc11-slow-gradcheck/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip
2025-12-04T15:43:31.2032340Z ##[group]Run cat test/**/*_toprint.log || true
2025-12-04T15:43:31.2032736Z [36;1mcat test/**/*_toprint.log || true[0m
2025-12-04T15:43:31.2041969Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:31.2042333Z env:
2025-12-04T15:43:31.2042529Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:31.2042805Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:31.2043349Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:31.2043940Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:31.2044434Z ##[endgroup]
2025-12-04T15:43:31.2156707Z cat: 'test/**/*_toprint.log': No such file or directory
2025-12-04T15:43:31.2187972Z ##[group]Run kill "$MONITOR_SCRIPT_PID"
2025-12-04T15:43:31.2188329Z [36;1mkill "$MONITOR_SCRIPT_PID"[0m
2025-12-04T15:43:31.2196711Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:31.2197076Z env:
2025-12-04T15:43:31.2197278Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:31.2197535Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:31.2197835Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:31.2198390Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:31.2198897Z   MONITOR_SCRIPT_PID: 59410
2025-12-04T15:43:31.2199148Z ##[endgroup]
2025-12-04T15:43:31.2230138Z /home/ec2-user/actions-runner/_work/_temp/6b52e012-4c76-4fc2-a68d-eb54305df0ff.sh: line 1: kill: (59410) - No such process
2025-12-04T15:43:31.2234060Z ##[error]Process completed with exit code 1.
2025-12-04T15:43:31.2362511Z Prepare all required actions
2025-12-04T15:43:31.2362906Z Getting action download info
2025-12-04T15:43:31.4133693Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T15:43:31.6500504Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T15:43:32.1650666Z ##[group]Run ./.github/actions/upload-test-artifacts
2025-12-04T15:43:32.1651020Z with:
2025-12-04T15:43:32.1651367Z   file-suffix: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T15:43:32.1651821Z   s3-bucket: gha-artifacts
2025-12-04T15:43:32.1652071Z env:
2025-12-04T15:43:32.1652262Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:32.1652518Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:32.1652833Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:32.1653384Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:32.1653919Z ##[endgroup]
2025-12-04T15:43:32.1696633Z ##[group]Run # Remove any previous test jsons if they exist
2025-12-04T15:43:32.1697100Z [36;1m# Remove any previous test jsons if they exist[0m
2025-12-04T15:43:32.1697474Z [36;1mrm -f test-jsons-*.zip[0m
2025-12-04T15:43:32.1697904Z [36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json'[0m
2025-12-04T15:43:32.1707339Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:32.1707708Z env:
2025-12-04T15:43:32.1708260Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:32.1708553Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:32.1708856Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:32.1709404Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:32.1710047Z   FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T15:43:32.1710469Z ##[endgroup]
2025-12-04T15:43:32.1937561Z   adding: test/test-reports/td_exclusions-8f4b859dc7ee5c40b00d.json (deflated 82%)
2025-12-04T15:43:32.1947921Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d2163ec8f4306bf7.json (deflated 94%)
2025-12-04T15:43:32.1978483Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-7dfb99a0e36ebc6b.json (deflated 94%)
2025-12-04T15:43:32.1983920Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f45bd9366a90530e.json (deflated 96%)
2025-12-04T15:43:32.1990221Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-85306c1f70284b1c.json (deflated 96%)
2025-12-04T15:43:32.2006994Z   adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-e8dc2e2d2922989b.json (deflated 94%)
2025-12-04T15:43:32.2009020Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.json (deflated 88%)
2025-12-04T15:43:32.2011161Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.json (deflated 88%)
2025-12-04T15:43:32.2012803Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.json (deflated 88%)
2025-12-04T15:43:32.2014878Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.json (deflated 88%)
2025-12-04T15:43:32.2016550Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.json (deflated 88%)
2025-12-04T15:43:32.2018717Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.json (deflated 88%)
2025-12-04T15:43:32.2020493Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.json (deflated 88%)
2025-12-04T15:43:32.2022814Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.json (deflated 88%)
2025-12-04T15:43:32.2024182Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.json (deflated 88%)
2025-12-04T15:43:32.2026303Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.json (deflated 88%)
2025-12-04T15:43:32.2027952Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.json (deflated 88%)
2025-12-04T15:43:32.2030022Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.json (deflated 88%)
2025-12-04T15:43:32.2031667Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.json (deflated 88%)
2025-12-04T15:43:32.2033772Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.json (deflated 88%)
2025-12-04T15:43:32.2035413Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.json (deflated 88%)
2025-12-04T15:43:32.2038319Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.json (deflated 91%)
2025-12-04T15:43:32.2039951Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.json (deflated 88%)
2025-12-04T15:43:32.2042066Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.json (deflated 88%)
2025-12-04T15:43:32.2043788Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.json (deflated 88%)
2025-12-04T15:43:32.2045768Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.json (deflated 88%)
2025-12-04T15:43:32.2047399Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.json (deflated 88%)
2025-12-04T15:43:32.2049460Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.json (deflated 88%)
2025-12-04T15:43:32.2051085Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.json (deflated 88%)
2025-12-04T15:43:32.2053088Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.json (deflated 88%)
2025-12-04T15:43:32.2054736Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.json (deflated 88%)
2025-12-04T15:43:32.2056754Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.json (deflated 88%)
2025-12-04T15:43:32.2058413Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.json (deflated 88%)
2025-12-04T15:43:32.2060480Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.json (deflated 88%)
2025-12-04T15:43:32.2062106Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.json (deflated 88%)
2025-12-04T15:43:32.2064105Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.json (deflated 88%)
2025-12-04T15:43:32.2066843Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.json (deflated 91%)
2025-12-04T15:43:32.2069458Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.json (deflated 90%)
2025-12-04T15:43:32.2071993Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.json (deflated 90%)
2025-12-04T15:43:32.2073695Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.json (deflated 88%)
2025-12-04T15:43:32.2076404Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.json (deflated 90%)
2025-12-04T15:43:32.2079020Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.json (deflated 90%)
2025-12-04T15:43:32.2081288Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.json (deflated 89%)
2025-12-04T15:43:32.2082850Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.json (deflated 88%)
2025-12-04T15:43:32.2085649Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.json (deflated 88%)
2025-12-04T15:43:32.2087362Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.json (deflated 88%)
2025-12-04T15:43:32.2089485Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.json (deflated 88%)
2025-12-04T15:43:32.2091180Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.json (deflated 88%)
2025-12-04T15:43:32.2093264Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.json (deflated 88%)
2025-12-04T15:43:32.2095196Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.json (deflated 88%)
2025-12-04T15:43:32.2096858Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.json (deflated 88%)
2025-12-04T15:43:32.2098926Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.json (deflated 88%)
2025-12-04T15:43:32.2100922Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.json (deflated 88%)
2025-12-04T15:43:32.2102631Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.json (deflated 88%)
2025-12-04T15:43:32.2104692Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.json (deflated 88%)
2025-12-04T15:43:32.2106389Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.json (deflated 88%)
2025-12-04T15:43:32.2109259Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.json (deflated 88%)
2025-12-04T15:43:32.2110904Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.json (deflated 88%)
2025-12-04T15:43:32.2112523Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.json (deflated 88%)
2025-12-04T15:43:32.2114585Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.json (deflated 88%)
2025-12-04T15:43:32.2116320Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.json (deflated 88%)
2025-12-04T15:43:32.2118340Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.json (deflated 88%)
2025-12-04T15:43:32.2120051Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.json (deflated 88%)
2025-12-04T15:43:32.2122086Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.json (deflated 88%)
2025-12-04T15:43:32.2123883Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.json (deflated 88%)
2025-12-04T15:43:32.2125761Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.json (deflated 88%)
2025-12-04T15:43:32.2127663Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.json (deflated 88%)
2025-12-04T15:43:32.2129559Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.json (deflated 88%)
2025-12-04T15:43:32.2131473Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.json (deflated 88%)
2025-12-04T15:43:32.2133325Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.json (deflated 88%)
2025-12-04T15:43:32.2135391Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.json (deflated 88%)
2025-12-04T15:43:32.2136899Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.json (deflated 88%)
2025-12-04T15:43:32.2140228Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.json (deflated 92%)
2025-12-04T15:43:32.2141897Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.json (deflated 88%)
2025-12-04T15:43:32.2143892Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.json (deflated 88%)
2025-12-04T15:43:32.2145563Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.json (deflated 88%)
2025-12-04T15:43:32.2147506Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.json (deflated 88%)
2025-12-04T15:43:32.2149147Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.json (deflated 88%)
2025-12-04T15:43:32.2151691Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.json (deflated 90%)
2025-12-04T15:43:32.2153374Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.json (deflated 88%)
2025-12-04T15:43:32.2155337Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.json (deflated 88%)
2025-12-04T15:43:32.2157011Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.json (deflated 88%)
2025-12-04T15:43:32.2158969Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.json (deflated 88%)
2025-12-04T15:43:32.2160581Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.json (deflated 88%)
2025-12-04T15:43:32.2163115Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.json (deflated 90%)
2025-12-04T15:43:32.2164760Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.json (deflated 88%)
2025-12-04T15:43:32.2166699Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.json (deflated 88%)
2025-12-04T15:43:32.2168320Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.json (deflated 88%)
2025-12-04T15:43:32.2170277Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.json (deflated 88%)
2025-12-04T15:43:32.2171897Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.json (deflated 88%)
2025-12-04T15:43:32.2174997Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.json (deflated 91%)
2025-12-04T15:43:32.2176993Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.json (deflated 89%)
2025-12-04T15:43:32.2178901Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.json (deflated 89%)
2025-12-04T15:43:32.2180943Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.json (deflated 89%)
2025-12-04T15:43:32.2182871Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.json (deflated 89%)
2025-12-04T15:43:32.2184823Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.json (deflated 89%)
2025-12-04T15:43:32.2186731Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.json (deflated 89%)
2025-12-04T15:43:32.2188680Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.json (deflated 89%)
2025-12-04T15:43:32.2190569Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.json (deflated 89%)
2025-12-04T15:43:32.2192620Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.json (deflated 89%)
2025-12-04T15:43:32.2194293Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.json (deflated 89%)
2025-12-04T15:43:32.2196361Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.json (deflated 89%)
2025-12-04T15:43:32.2198254Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.json (deflated 89%)
2025-12-04T15:43:32.2200197Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.json (deflated 89%)
2025-12-04T15:43:32.2202101Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.json (deflated 89%)
2025-12-04T15:43:32.2204081Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.json (deflated 89%)
2025-12-04T15:43:32.2205990Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.json (deflated 89%)
2025-12-04T15:43:32.2208213Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.json (deflated 89%)
2025-12-04T15:43:32.2210643Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.json (deflated 88%)
2025-12-04T15:43:32.2212560Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.json (deflated 88%)
2025-12-04T15:43:32.2214123Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.json (deflated 88%)
2025-12-04T15:43:32.2218681Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.json (deflated 98%)
2025-12-04T15:43:32.2219868Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-fcf8b9b0a2e7a178.json (deflated 93%)
2025-12-04T15:43:32.2239955Z   adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-cc2491bbd877af9c.json (deflated 95%)
2025-12-04T15:43:32.2244610Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-66246eed1b64fd5c.json (deflated 89%)
2025-12-04T15:43:32.2339845Z   adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-38411ac3079c7061.json (deflated 95%)
2025-12-04T15:43:32.2343364Z   adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-ca7327bb8f17c961.json (deflated 84%)
2025-12-04T15:43:32.2346379Z   adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-b7f63c3b423acf1d.json (deflated 91%)
2025-12-04T15:43:32.2348304Z   adding: test/test-reports/python-pytest/dynamo.test_callback/dynamo.test_callback-6c0ee54264bcedf0.json (deflated 82%)
2025-12-04T15:43:32.2349572Z   adding: test/test-reports/python-pytest/inductor.test_custom_op_autotune/inductor.test_custom_op_autotune-8f7d8d00cc13374f.json (deflated 80%)
2025-12-04T15:43:32.2353560Z   adding: test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.json (deflated 90%)
2025-12-04T15:43:32.2382850Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.json (deflated 97%)
2025-12-04T15:43:32.2383939Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.json (deflated 91%)
2025-12-04T15:43:32.2385195Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.json (deflated 91%)
2025-12-04T15:43:32.2386493Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.json (deflated 91%)
2025-12-04T15:43:32.2387855Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.json (deflated 91%)
2025-12-04T15:43:32.2389443Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.json (deflated 91%)
2025-12-04T15:43:32.2415355Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.json (deflated 97%)
2025-12-04T15:43:32.2422886Z   adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-95ccd07868721469.json (deflated 95%)
2025-12-04T15:43:32.2437838Z   adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-1e96fc6cc9093b07.json (deflated 96%)
2025-12-04T15:43:32.2453997Z   adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-91f289dc18834c3e.json (deflated 96%)
2025-12-04T15:43:32.2500073Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-05b5b699aba88456.json (deflated 95%)
2025-12-04T15:43:32.2501145Z   adding: test/test-reports/python-pytest/dynamo.test_after_aot/dynamo.test_after_aot-138e4478191117d7.json (deflated 59%)
2025-12-04T15:43:32.2503972Z   adding: test/test-reports/python-pytest/inductor.test_snode_runtime/inductor.test_snode_runtime-f1ec066e866be26d.json (deflated 92%)
2025-12-04T15:43:32.2543869Z   adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-bf57fb8d20e32a72.json (deflated 93%)
2025-12-04T15:43:32.2578562Z   adding: test/test-reports/python-pytest/test_testing/test_testing-4c4caba52af0adff.json (deflated 97%)
2025-12-04T15:43:32.2579761Z   adding: test/test-reports/python-pytest/inductor.test_autoheuristic/inductor.test_autoheuristic-10f7d7896ce04bc8.json (stored 0%)
2025-12-04T15:43:32.2580949Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-c4d4e9aba2280ad9.json (deflated 92%)
2025-12-04T15:43:32.2582254Z   adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-8a04be886b6d69cf.json (deflated 82%)
2025-12-04T15:43:32.2583524Z   adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-c7e05865cddca77f.json (deflated 74%)
2025-12-04T15:43:32.2584915Z   adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6d20a7277844030b.json (deflated 74%)
2025-12-04T15:43:32.2586309Z   adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-6a2d2929a87aa7f5.json (deflated 83%)
2025-12-04T15:43:32.2587705Z   adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-b498ae4cc20525c9.json (deflated 70%)
2025-12-04T15:43:32.2589102Z   adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-4c5fe50d62df582d.json (deflated 62%)
2025-12-04T15:43:32.2590233Z   adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-179ecdae5d21ef0e.json (deflated 66%)
2025-12-04T15:43:32.2591446Z   adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-bacbff1a865ff8bb.json (deflated 62%)
2025-12-04T15:43:32.2592657Z   adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-e71c26709471ff2e.json (deflated 51%)
2025-12-04T15:43:32.2593863Z   adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-45ff8ae422230f99.json (deflated 90%)
2025-12-04T15:43:32.2595116Z   adding: test/test-reports/python-pytest/inductor.test_provenance_tracing/inductor.test_provenance_tracing-6455ccf06df051be.json (deflated 87%)
2025-12-04T15:43:32.2596324Z   adding: test/test-reports/python-pytest/inductor.test_memory_planning/inductor.test_memory_planning-d9b25b367275156e.json (deflated 71%)
2025-12-04T15:43:32.2638887Z   adding: test/test-reports/python-pytest/export.test_cpp_serdes/export.test_cpp_serdes-72e11f38870e0d13.json (deflated 96%)
2025-12-04T15:43:32.2657839Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5ad0fee917746162.json (deflated 97%)
2025-12-04T15:43:32.2660697Z   adding: test/test-reports/python-pytest/test_sort_and_select/test_sort_and_select-049427debff60b53.json (deflated 95%)
2025-12-04T15:43:32.2661802Z   adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-cccd30d217a8d074.json (deflated 88%)
2025-12-04T15:43:32.2664460Z   adding: test/test-reports/python-pytest/test_package/test_package-a2f65f799bf50b4a.json (deflated 93%)
2025-12-04T15:43:32.2665439Z   adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-c19a0c4320bf6e65.json (deflated 64%)
2025-12-04T15:43:32.2666623Z   adding: test/test-reports/python-pytest/test_utils_config_module/test_utils_config_module-cd73bdff208ab311.json (deflated 90%)
2025-12-04T15:43:32.2667655Z   adding: test/test-reports/python-pytest/test_hop_infra/test_hop_infra-d1efcb546b726ee3.json (deflated 72%)
2025-12-04T15:43:32.2668735Z   adding: test/test-reports/python-pytest/test_appending_byte_serializer/test_appending_byte_serializer-db1af3fc87bd6240.json (deflated 76%)
2025-12-04T15:43:32.2670211Z   adding: test/test-reports/python-pytest/test_ao_sparsity/test_ao_sparsity-47b60e8cb29a5ef6.json (deflated 91%)
2025-12-04T15:43:32.2671206Z   adding: test/test-reports/python-pytest/test_extension_utils/test_extension_utils-5e3baa267a09a3bb.json (deflated 64%)
2025-12-04T15:43:32.2672460Z   adding: test/test-reports/python-pytest/nn.attention.test_fa4/nn.attention.test_fa4-2d55ad78ccee943a.json (deflated 98%)
2025-12-04T15:43:32.2678524Z   adding: test/test-reports/python-pytest/typing.test_python_operators/typing.test_python_operators-7b01e9f4c56696ce.json (deflated 98%)
2025-12-04T15:43:32.2679611Z   adding: test/test-reports/python-pytest/torch_np.test_dtype/torch_np.test_dtype-50c590a3e827391c.json (deflated 96%)
2025-12-04T15:43:32.2680544Z   adding: test/test-reports/python-pytest/test_file_check/test_file_check-c5f916d4f839abe2.json (deflated 61%)
2025-12-04T15:43:32.2681513Z   adding: test/test-reports/python-pytest/profiler.test_kineto/profiler.test_kineto-1437f02ea71dbd19.json (deflated 37%)
2025-12-04T15:43:32.2682596Z   adding: test/test-reports/python-pytest/functorch.test_ac_knapsack/functorch.test_ac_knapsack-a2f3dae1f99bc885.json (deflated 87%)
2025-12-04T15:43:32.2717066Z   adding: test/test-reports/python-pytest/torch_np.test_nep50_examples/torch_np.test_nep50_examples-87e42828c2fde829.json (deflated 99%)
2025-12-04T15:43:32.2738870Z   adding: test/test-reports/python-pytest/test_torch/test_torch-6322eeaa434bd119.json (deflated 95%)
2025-12-04T15:43:32.2739766Z   adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-6cf9ed264c8fa189.json (stored 0%)
2025-12-04T15:43:32.2944934Z   adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-510898c7a9dfb9c9.json (deflated 98%)
2025-12-04T15:43:32.2963216Z   adding: test/test-reports/python-pytest/test_modules/test_modules-1ceed37f0450876d.json (deflated 96%)
2025-12-04T15:43:32.2969484Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-320a7bc7a2da135c.json (deflated 97%)
2025-12-04T15:43:32.2972868Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-9c6a851d43187f63.json (deflated 97%)
2025-12-04T15:43:32.2973994Z   adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-612fe6974f2e86fb.json (deflated 33%)
2025-12-04T15:43:32.2975000Z   adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-573eaa6de6818c33.json (deflated 94%)
2025-12-04T15:43:32.2975961Z   adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-8ae5e584fb53bb5e.json (deflated 96%)
2025-12-04T15:43:32.2977369Z   adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-419c9aea1e4e06f2.json (deflated 87%)
2025-12-04T15:43:32.2981569Z   adding: test/test-reports/python-pytest/test_indexing/test_indexing-bb3db4f55bab2e87.json (deflated 95%)
2025-12-04T15:43:32.2982671Z   adding: test/test-reports/python-pytest/test_type_info/test_type_info-3cbecfd6afe8711f.json (deflated 83%)
2025-12-04T15:43:32.3002128Z   adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-3265775c77799c99.json (deflated 95%)
2025-12-04T15:43:32.3003708Z   adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5e8dbe55d5e60a97.json (deflated 95%)
2025-12-04T15:43:32.3006237Z   adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-339f2b8a0ba2c562.json (deflated 94%)
2025-12-04T15:43:32.3008000Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-7a9eb44e36e96ef2.json (deflated 95%)
2025-12-04T15:43:32.3009980Z   adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-8a1338a601c4ef0b.json (deflated 91%)
2025-12-04T15:43:32.3011008Z   adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-d08ca7b1f6355251.json (deflated 88%)
2025-12-04T15:43:32.3011983Z   adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-bb3f84e769cc626f.json (deflated 91%)
2025-12-04T15:43:32.3012939Z   adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-081f0752aeda15ae.json (deflated 83%)
2025-12-04T15:43:32.3021435Z   adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-3f39f26aca555a70.json (deflated 98%)
2025-12-04T15:43:32.3094715Z   adding: test/test-reports/python-pytest/test_reductions/test_reductions-31a848701d5079bd.json (deflated 98%)
2025-12-04T15:43:32.3095747Z   adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204154318.json (deflated 38%)
2025-12-04T15:43:32.3125878Z ##[group]Run # Remove any previous test reports if they exist
2025-12-04T15:43:32.3126365Z [36;1m# Remove any previous test reports if they exist[0m
2025-12-04T15:43:32.3126771Z [36;1mrm -f test-reports-*.zip[0m
2025-12-04T15:43:32.3127265Z [36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv'[0m
2025-12-04T15:43:32.3136520Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:32.3136899Z env:
2025-12-04T15:43:32.3137114Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:32.3137379Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:32.3137696Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:32.3138275Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:32.3139034Z   FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T15:43:32.3139611Z ##[endgroup]
2025-12-04T15:43:32.3287611Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d2163ec8f4306bf7.xml (deflated 93%)
2025-12-04T15:43:32.3312706Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-7dfb99a0e36ebc6b.xml (deflated 93%)
2025-12-04T15:43:32.3317191Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f45bd9366a90530e.xml (deflated 92%)
2025-12-04T15:43:32.3322051Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-85306c1f70284b1c.xml (deflated 93%)
2025-12-04T15:43:32.3337627Z   adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-e8dc2e2d2922989b.xml (deflated 94%)
2025-12-04T15:43:32.3339407Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml (deflated 88%)
2025-12-04T15:43:32.3341706Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml (deflated 88%)
2025-12-04T15:43:32.3343333Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml (deflated 88%)
2025-12-04T15:43:32.3345705Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml (deflated 88%)
2025-12-04T15:43:32.3347155Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml (deflated 88%)
2025-12-04T15:43:32.3349577Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml (deflated 88%)
2025-12-04T15:43:32.3351120Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml (deflated 88%)
2025-12-04T15:43:32.3353294Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml (deflated 88%)
2025-12-04T15:43:32.3355328Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml (deflated 88%)
2025-12-04T15:43:32.3357378Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml (deflated 88%)
2025-12-04T15:43:32.3359335Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml (deflated 88%)
2025-12-04T15:43:32.3361191Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml (deflated 88%)
2025-12-04T15:43:32.3363111Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml (deflated 88%)
2025-12-04T15:43:32.3365336Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml (deflated 88%)
2025-12-04T15:43:32.3366942Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml (deflated 88%)
2025-12-04T15:43:32.3370057Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml (deflated 91%)
2025-12-04T15:43:32.3371665Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml (deflated 88%)
2025-12-04T15:43:32.3373933Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml (deflated 88%)
2025-12-04T15:43:32.3375518Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml (deflated 88%)
2025-12-04T15:43:32.3377625Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml (deflated 88%)
2025-12-04T15:43:32.3379395Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml (deflated 88%)
2025-12-04T15:43:32.3381480Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml (deflated 88%)
2025-12-04T15:43:32.3383183Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml (deflated 88%)
2025-12-04T15:43:32.3385282Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml (deflated 88%)
2025-12-04T15:43:32.3386980Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml (deflated 88%)
2025-12-04T15:43:32.3389115Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml (deflated 88%)
2025-12-04T15:43:32.3390775Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml (deflated 88%)
2025-12-04T15:43:32.3392937Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml (deflated 88%)
2025-12-04T15:43:32.3394860Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml (deflated 88%)
2025-12-04T15:43:32.3396572Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml (deflated 88%)
2025-12-04T15:43:32.3399633Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml (deflated 90%)
2025-12-04T15:43:32.3402364Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml (deflated 90%)
2025-12-04T15:43:32.3405250Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml (deflated 90%)
2025-12-04T15:43:32.3406909Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml (deflated 88%)
2025-12-04T15:43:32.3410375Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml (deflated 90%)
2025-12-04T15:43:32.3412918Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml (deflated 90%)
2025-12-04T15:43:32.3415143Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml (deflated 88%)
2025-12-04T15:43:32.3417139Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml (deflated 88%)
2025-12-04T15:43:32.3419048Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml (deflated 88%)
2025-12-04T15:43:32.3421932Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml (deflated 88%)
2025-12-04T15:43:32.3423835Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml (deflated 88%)
2025-12-04T15:43:32.3426191Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml (deflated 88%)
2025-12-04T15:43:32.3427702Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml (deflated 88%)
2025-12-04T15:43:32.3429884Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml (deflated 88%)
2025-12-04T15:43:32.3431735Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml (deflated 88%)
2025-12-04T15:43:32.3433864Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml (deflated 88%)
2025-12-04T15:43:32.3435852Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml (deflated 88%)
2025-12-04T15:43:32.3437852Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml (deflated 88%)
2025-12-04T15:43:32.3439836Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml (deflated 88%)
2025-12-04T15:43:32.3441821Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml (deflated 88%)
2025-12-04T15:43:32.3443584Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml (deflated 88%)
2025-12-04T15:43:32.3445752Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml (deflated 88%)
2025-12-04T15:43:32.3447477Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml (deflated 88%)
2025-12-04T15:43:32.3449626Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml (deflated 88%)
2025-12-04T15:43:32.3451480Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml (deflated 88%)
2025-12-04T15:43:32.3453556Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml (deflated 88%)
2025-12-04T15:43:32.3455536Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml (deflated 88%)
2025-12-04T15:43:32.3457517Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml (deflated 88%)
2025-12-04T15:43:32.3459532Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml (deflated 88%)
2025-12-04T15:43:32.3461403Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml (deflated 88%)
2025-12-04T15:43:32.3463529Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml (deflated 88%)
2025-12-04T15:43:32.3465580Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml (deflated 88%)
2025-12-04T15:43:32.3467290Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml (deflated 88%)
2025-12-04T15:43:32.3469389Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml (deflated 88%)
2025-12-04T15:43:32.3471265Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml (deflated 88%)
2025-12-04T15:43:32.3473273Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml (deflated 88%)
2025-12-04T15:43:32.3476459Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml (deflated 91%)
2025-12-04T15:43:32.3478187Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml (deflated 88%)
2025-12-04T15:43:32.3480329Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml (deflated 88%)
2025-12-04T15:43:32.3481995Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml (deflated 88%)
2025-12-04T15:43:32.3484090Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml (deflated 88%)
2025-12-04T15:43:32.3485756Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml (deflated 88%)
2025-12-04T15:43:32.3488493Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml (deflated 89%)
2025-12-04T15:43:32.3490341Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml (deflated 88%)
2025-12-04T15:43:32.3492327Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml (deflated 88%)
2025-12-04T15:43:32.3494232Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml (deflated 88%)
2025-12-04T15:43:32.3496119Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml (deflated 88%)
2025-12-04T15:43:32.3498106Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml (deflated 88%)
2025-12-04T15:43:32.3500745Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml (deflated 89%)
2025-12-04T15:43:32.3502458Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml (deflated 88%)
2025-12-04T15:43:32.3504502Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml (deflated 88%)
2025-12-04T15:43:32.3506191Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml (deflated 88%)
2025-12-04T15:43:32.3508434Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml (deflated 88%)
2025-12-04T15:43:32.3511103Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml (deflated 88%)
2025-12-04T15:43:32.3514335Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml (deflated 90%)
2025-12-04T15:43:32.3516456Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml (deflated 88%)
2025-12-04T15:43:32.3518533Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml (deflated 88%)
2025-12-04T15:43:32.3520602Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml (deflated 88%)
2025-12-04T15:43:32.3522706Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml (deflated 88%)
2025-12-04T15:43:32.3524939Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml (deflated 88%)
2025-12-04T15:43:32.3526867Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml (deflated 88%)
2025-12-04T15:43:32.3528912Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml (deflated 88%)
2025-12-04T15:43:32.3530982Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml (deflated 88%)
2025-12-04T15:43:32.3533051Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml (deflated 88%)
2025-12-04T15:43:32.3535153Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml (deflated 88%)
2025-12-04T15:43:32.3537220Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml (deflated 88%)
2025-12-04T15:43:32.3539370Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml (deflated 88%)
2025-12-04T15:43:32.3541476Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml (deflated 88%)
2025-12-04T15:43:32.3543553Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml (deflated 88%)
2025-12-04T15:43:32.3545580Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml (deflated 88%)
2025-12-04T15:43:32.3547640Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml (deflated 88%)
2025-12-04T15:43:32.3549748Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml (deflated 88%)
2025-12-04T15:43:32.3551446Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml (deflated 88%)
2025-12-04T15:43:32.3553502Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml (deflated 88%)
2025-12-04T15:43:32.3555168Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml (deflated 88%)
2025-12-04T15:43:32.3558865Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml (deflated 97%)
2025-12-04T15:43:32.3559876Z   adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-fcf8b9b0a2e7a178.xml (deflated 90%)
2025-12-04T15:43:32.3577256Z   adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-cc2491bbd877af9c.xml (deflated 94%)
2025-12-04T15:43:32.3581356Z   adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-66246eed1b64fd5c.xml (deflated 87%)
2025-12-04T15:43:32.3663905Z   adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-38411ac3079c7061.xml (deflated 95%)
2025-12-04T15:43:32.3665519Z   adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-ca7327bb8f17c961.xml (deflated 81%)
2025-12-04T15:43:32.3668898Z   adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-b7f63c3b423acf1d.xml (deflated 89%)
2025-12-04T15:43:32.3670466Z   adding: test/test-reports/python-pytest/dynamo.test_callback/dynamo.test_callback-6c0ee54264bcedf0.xml (deflated 81%)
2025-12-04T15:43:32.3672900Z   adding: test/test-reports/python-pytest/inductor.test_custom_op_autotune/inductor.test_custom_op_autotune-8f7d8d00cc13374f.xml (deflated 79%)
2025-12-04T15:43:32.3677673Z   adding: test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml (deflated 86%)
2025-12-04T15:43:32.3702005Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml (deflated 95%)
2025-12-04T15:43:32.3703452Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml (deflated 90%)
2025-12-04T15:43:32.3705166Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml (deflated 90%)
2025-12-04T15:43:32.3706420Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml (deflated 90%)
2025-12-04T15:43:32.3708009Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml (deflated 90%)
2025-12-04T15:43:32.3711533Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml (deflated 90%)
2025-12-04T15:43:32.3731105Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml (deflated 96%)
2025-12-04T15:43:32.3737527Z   adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-95ccd07868721469.xml (deflated 93%)
2025-12-04T15:43:32.3750494Z   adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-1e96fc6cc9093b07.xml (deflated 95%)
2025-12-04T15:43:32.3764159Z   adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-91f289dc18834c3e.xml (deflated 95%)
2025-12-04T15:43:32.3800025Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-05b5b699aba88456.xml (deflated 93%)
2025-12-04T15:43:32.3801033Z   adding: test/test-reports/python-pytest/dynamo.test_after_aot/dynamo.test_after_aot-138e4478191117d7.xml (deflated 52%)
2025-12-04T15:43:32.3803928Z   adding: test/test-reports/python-pytest/inductor.test_snode_runtime/inductor.test_snode_runtime-f1ec066e866be26d.xml (deflated 92%)
2025-12-04T15:43:32.3841657Z   adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-bf57fb8d20e32a72.xml (deflated 92%)
2025-12-04T15:43:32.3864216Z   adding: test/test-reports/python-pytest/test_testing/test_testing-4c4caba52af0adff.xml (deflated 96%)
2025-12-04T15:43:32.3865377Z   adding: test/test-reports/python-pytest/inductor.test_autoheuristic/inductor.test_autoheuristic-10f7d7896ce04bc8.xml (deflated 28%)
2025-12-04T15:43:32.3866560Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-c4d4e9aba2280ad9.xml (deflated 88%)
2025-12-04T15:43:32.3867936Z   adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-8a04be886b6d69cf.xml (deflated 79%)
2025-12-04T15:43:32.3869194Z   adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-c7e05865cddca77f.xml (deflated 59%)
2025-12-04T15:43:32.3870455Z   adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6d20a7277844030b.xml (deflated 64%)
2025-12-04T15:43:32.3871973Z   adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-6a2d2929a87aa7f5.xml (deflated 80%)
2025-12-04T15:43:32.3873241Z   adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-b498ae4cc20525c9.xml (deflated 63%)
2025-12-04T15:43:32.3874585Z   adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-4c5fe50d62df582d.xml (deflated 52%)
2025-12-04T15:43:32.3875726Z   adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-179ecdae5d21ef0e.xml (deflated 61%)
2025-12-04T15:43:32.3876858Z   adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-bacbff1a865ff8bb.xml (deflated 48%)
2025-12-04T15:43:32.3878068Z   adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-e71c26709471ff2e.xml (deflated 50%)
2025-12-04T15:43:32.3879286Z   adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-45ff8ae422230f99.xml (deflated 85%)
2025-12-04T15:43:32.3880549Z   adding: test/test-reports/python-pytest/inductor.test_provenance_tracing/inductor.test_provenance_tracing-6455ccf06df051be.xml (deflated 85%)
2025-12-04T15:43:32.3881780Z   adding: test/test-reports/python-pytest/inductor.test_memory_planning/inductor.test_memory_planning-d9b25b367275156e.xml (deflated 67%)
2025-12-04T15:43:32.3920614Z   adding: test/test-reports/python-pytest/export.test_cpp_serdes/export.test_cpp_serdes-72e11f38870e0d13.xml (deflated 96%)
2025-12-04T15:43:32.3938155Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5ad0fee917746162.xml (deflated 97%)
2025-12-04T15:43:32.3940095Z   adding: test/test-reports/python-pytest/test_sort_and_select/test_sort_and_select-049427debff60b53.xml (deflated 91%)
2025-12-04T15:43:32.3941243Z   adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-cccd30d217a8d074.xml (deflated 77%)
2025-12-04T15:43:32.3943471Z   adding: test/test-reports/python-pytest/test_package/test_package-a2f65f799bf50b4a.xml (deflated 87%)
2025-12-04T15:43:32.3944421Z   adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-c19a0c4320bf6e65.xml (deflated 50%)
2025-12-04T15:43:32.3945596Z   adding: test/test-reports/python-pytest/test_utils_config_module/test_utils_config_module-cd73bdff208ab311.xml (deflated 82%)
2025-12-04T15:43:32.3946582Z   adding: test/test-reports/python-pytest/test_hop_infra/test_hop_infra-d1efcb546b726ee3.xml (deflated 57%)
2025-12-04T15:43:32.3947643Z   adding: test/test-reports/python-pytest/test_appending_byte_serializer/test_appending_byte_serializer-db1af3fc87bd6240.xml (deflated 61%)
2025-12-04T15:43:32.3948943Z   adding: test/test-reports/python-pytest/test_ao_sparsity/test_ao_sparsity-47b60e8cb29a5ef6.xml (deflated 85%)
2025-12-04T15:43:32.3949917Z   adding: test/test-reports/python-pytest/test_extension_utils/test_extension_utils-5e3baa267a09a3bb.xml (deflated 52%)
2025-12-04T15:43:32.3951204Z   adding: test/test-reports/python-pytest/nn.attention.test_fa4/nn.attention.test_fa4-2d55ad78ccee943a.xml (deflated 97%)
2025-12-04T15:43:32.3955414Z   adding: test/test-reports/python-pytest/typing.test_python_operators/typing.test_python_operators-7b01e9f4c56696ce.xml (deflated 96%)
2025-12-04T15:43:32.3956821Z   adding: test/test-reports/python-pytest/torch_np.test_dtype/torch_np.test_dtype-50c590a3e827391c.xml (deflated 94%)
2025-12-04T15:43:32.3958026Z   adding: test/test-reports/python-pytest/test_file_check/test_file_check-c5f916d4f839abe2.xml (deflated 47%)
2025-12-04T15:43:32.3959055Z   adding: test/test-reports/python-pytest/profiler.test_kineto/profiler.test_kineto-1437f02ea71dbd19.xml (deflated 37%)
2025-12-04T15:43:32.3960147Z   adding: test/test-reports/python-pytest/functorch.test_ac_knapsack/functorch.test_ac_knapsack-a2f3dae1f99bc885.xml (deflated 78%)
2025-12-04T15:43:32.3988896Z   adding: test/test-reports/python-pytest/torch_np.test_nep50_examples/torch_np.test_nep50_examples-87e42828c2fde829.xml (deflated 99%)
2025-12-04T15:43:32.4004859Z   adding: test/test-reports/python-pytest/test_torch/test_torch-6322eeaa434bd119.xml (deflated 92%)
2025-12-04T15:43:32.4006009Z   adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-6cf9ed264c8fa189.xml (deflated 28%)
2025-12-04T15:43:32.4145155Z   adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-510898c7a9dfb9c9.xml (deflated 97%)
2025-12-04T15:43:32.4158779Z   adding: test/test-reports/python-pytest/test_modules/test_modules-1ceed37f0450876d.xml (deflated 94%)
2025-12-04T15:43:32.4163480Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-320a7bc7a2da135c.xml (deflated 94%)
2025-12-04T15:43:32.4166165Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-9c6a851d43187f63.xml (deflated 95%)
2025-12-04T15:43:32.4167606Z   adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-612fe6974f2e86fb.xml (deflated 35%)
2025-12-04T15:43:32.4168759Z   adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-573eaa6de6818c33.xml (deflated 89%)
2025-12-04T15:43:32.4169921Z   adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-8ae5e584fb53bb5e.xml (deflated 92%)
2025-12-04T15:43:32.4171299Z   adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-419c9aea1e4e06f2.xml (deflated 79%)
2025-12-04T15:43:32.4173051Z   adding: test/test-reports/python-pytest/test_indexing/test_indexing-bb3db4f55bab2e87.xml (deflated 90%)
2025-12-04T15:43:32.4174155Z   adding: test/test-reports/python-pytest/test_type_info/test_type_info-3cbecfd6afe8711f.xml (deflated 68%)
2025-12-04T15:43:32.4190641Z   adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-3265775c77799c99.xml (deflated 93%)
2025-12-04T15:43:32.4192195Z   adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5e8dbe55d5e60a97.xml (deflated 91%)
2025-12-04T15:43:32.4193614Z   adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-339f2b8a0ba2c562.xml (deflated 91%)
2025-12-04T15:43:32.4195255Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-7a9eb44e36e96ef2.xml (deflated 90%)
2025-12-04T15:43:32.4196964Z   adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-8a1338a601c4ef0b.xml (deflated 86%)
2025-12-04T15:43:32.4198478Z   adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-d08ca7b1f6355251.xml (deflated 81%)
2025-12-04T15:43:32.4199883Z   adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-bb3f84e769cc626f.xml (deflated 83%)
2025-12-04T15:43:32.4201130Z   adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-081f0752aeda15ae.xml (deflated 80%)
2025-12-04T15:43:32.4205619Z   adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-3f39f26aca555a70.xml (deflated 96%)
2025-12-04T15:43:32.4259787Z   adding: test/test-reports/python-pytest/test_reductions/test_reductions-31a848701d5079bd.xml (deflated 96%)
2025-12-04T15:43:32.4260823Z   adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204154318.xml (deflated 43%)
2025-12-04T15:43:32.4290378Z ##[group]Run # Remove any previous usage logs if they exist
2025-12-04T15:43:32.4290833Z [36;1m# Remove any previous usage logs if they exist[0m
2025-12-04T15:43:32.4291204Z [36;1mrm -f logs-*.zip[0m
2025-12-04T15:43:32.4291548Z [36;1mzip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true[0m
2025-12-04T15:43:32.4292048Z [36;1mzip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true[0m
2025-12-04T15:43:32.4301440Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:32.4301818Z env:
2025-12-04T15:43:32.4302022Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:32.4302284Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:32.4302597Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:32.4303162Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:32.4303894Z   FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T15:43:32.4304340Z ##[endgroup]
2025-12-04T15:43:32.4387775Z   adding: usage_log.txt (deflated 58%)
2025-12-04T15:43:32.4444987Z   adding: test/test-reports/inductor.test_aot_inductor_2.5_ac1d7e2a37fbed81_.log (deflated 90%)
2025-12-04T15:43:32.4460657Z   adding: test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_1.4_295ecc74e041d7f8_.log (deflated 92%)
2025-12-04T15:43:32.4470539Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_4.14_2b71ae42f7581618_.log (deflated 92%)
2025-12-04T15:43:32.4478104Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_12.14_f1debdb3c47cb0ae_.log (deflated 91%)
2025-12-04T15:43:32.4483802Z   adding: test/test-reports/inductor.test_flex_attention_6.6_cafbaa2a62098057_.log (deflated 89%)
2025-12-04T15:43:32.4843356Z   adding: test/test-reports/inductor.test_fp8_1.1_440b1865b73f9802_.log (deflated 95%)
2025-12-04T15:43:32.4844682Z   adding: test/test-reports/dynamo.test_model_output_1.1_2df9271f2ebae91b_.log (deflated 79%)
2025-12-04T15:43:32.4856013Z   adding: test/test-reports/inductor.test_triton_kernels_1.1_4c43492168172809_.log (deflated 92%)
2025-12-04T15:43:32.4860603Z   adding: test/test-reports/inductor.test_loop_ordering_1.1_cda1b68c4235c80b_.log (deflated 89%)
2025-12-04T15:43:32.4903558Z   adding: test/test-reports/export.test_serdes_1.1_c37c9c83d5d3a964_.log (deflated 91%)
2025-12-04T15:43:32.4904773Z   adding: test/test-reports/inductor.test_scatter_optimization_1.1_38363d3a7ae9f86e_.log (deflated 79%)
2025-12-04T15:43:32.4906883Z   adding: test/test-reports/inductor.test_padding_1.1_3b58a6813a3709bc_.log (deflated 86%)
2025-12-04T15:43:32.4907599Z   adding: test/test-reports/dynamo.test_callback_1.1_4647abf0637b193b_.log (deflated 61%)
2025-12-04T15:43:32.4908597Z   adding: test/test-reports/inductor.test_custom_op_autotune_1.1_2272505dccfac9af_.log (deflated 62%)
2025-12-04T15:43:32.4917927Z   adding: test/test-reports/test_cuda_1.1_5ed6ed395e86485d_.log (deflated 85%)
2025-12-04T15:43:32.4994042Z   adding: test/test-reports/test_sparse_1.1_e217f60a40d48402_.log (deflated 95%)
2025-12-04T15:43:32.5001866Z   adding: test/test-reports/test_ops_fwd_gradients_6.12_abead446b517b77f_.log (deflated 91%)
2025-12-04T15:43:32.5017348Z   adding: test/test-reports/test_ops_gradients_2.10_8b90327e47e16b38_.log (deflated 92%)
2025-12-04T15:43:32.5034000Z   adding: test/test-reports/test_ops_gradients_10.10_690d4f6748dd1bf7_.log (deflated 92%)
2025-12-04T15:43:32.5075761Z   adding: test/test-reports/functorch.test_ops_3.6_4e22832cb04fe87a_.log (deflated 92%)
2025-12-04T15:43:32.5076461Z   adding: test/test-reports/dynamo.test_after_aot_1.1_e8843ead62c525f1_.log (deflated 54%)
2025-12-04T15:43:32.5077343Z   adding: test/test-reports/inductor.test_snode_runtime_1.1_f8102af9af532885_.log (deflated 79%)
2025-12-04T15:43:32.5094620Z   adding: test/test-reports/inductor.test_compiled_autograd_1.2_d8737cb5eeb8c364_.log (deflated 90%)
2025-12-04T15:43:32.5139812Z   adding: test/test-reports/test_testing_1.1_6250d60ab394f89f_.log (deflated 94%)
2025-12-04T15:43:32.5140524Z   adding: test/test-reports/inductor.test_autoheuristic_1.1_6939193d627efb00_.log (deflated 50%)
2025-12-04T15:43:32.5141302Z   adding: test/test-reports/inductor.test_cutedsl_template_1.1_c65b62856ae46e85_.log (deflated 77%)
2025-12-04T15:43:32.5142113Z   adding: test/test-reports/inductor.test_benchmark_fusion_1.1_f16e3698532d27f8_.log (deflated 76%)
2025-12-04T15:43:32.5142876Z   adding: test/test-reports/inductor.test_remote_cache_1.1_e90358269eb2823f_.log (deflated 60%)
2025-12-04T15:43:32.5143821Z   adding: test/test-reports/inductor.test_coordinate_descent_tuner_1.1_2fd6afd7cb5bda25_.log (deflated 68%)
2025-12-04T15:43:32.5144652Z   adding: test/test-reports/inductor.test_inplace_padding_1.1_25c4b19bcfb0badf_.log (deflated 69%)
2025-12-04T15:43:32.5145427Z   adding: test/test-reports/inductor.test_cudacodecache_1.1_20e9a908d42a6261_.log (deflated 56%)
2025-12-04T15:43:32.5146266Z   adding: test/test-reports/inductor.test_minifier_utils_1.1_82d82b53a102b66f_.log (deflated 60%)
2025-12-04T15:43:32.5147020Z   adding: test/test-reports/inductor.test_debug_trace_1.1_cc4f32af9453e690_.log (deflated 62%)
2025-12-04T15:43:32.5147744Z   adding: test/test-reports/export.test_tree_utils_1.1_0e627f819fabbb55_.log (deflated 55%)
2025-12-04T15:43:32.5148563Z   adding: test/test-reports/inductor.test_triton_wrapper_1.1_25aa967110a2fbe1_.log (deflated 53%)
2025-12-04T15:43:32.5149348Z   adding: test/test-reports/inductor.test_static_cuda_launcher_1.1_0c71a221d8835012_.log (deflated 79%)
2025-12-04T15:43:32.5150185Z   adding: test/test-reports/inductor.test_provenance_tracing_1.1_80110daa3530439c_.log (deflated 80%)
2025-12-04T15:43:32.5151265Z   adding: test/test-reports/inductor.test_memory_planning_1.1_fa1d6b036138d22f_.log (deflated 59%)
2025-12-04T15:43:32.5166506Z   adding: test/test-reports/export.test_cpp_serdes_1.1_75563679f31ba4f4_.log (deflated 89%)
2025-12-04T15:43:32.5630720Z   adding: test/test-reports/inductor.test_control_flow_2.4_3b4432ec9408add0_.log (deflated 96%)
2025-12-04T15:43:32.5633467Z   adding: test/test-reports/test_sort_and_select_1.1_bec7fa88f7702fb0_.log (deflated 89%)
2025-12-04T15:43:32.5634198Z   adding: test/test-reports/functorch.test_rearrange_1.1_a7b15b1a80eb0b56_.log (deflated 71%)
2025-12-04T15:43:32.5638631Z   adding: test/test-reports/test_package_1.1_f2ef9e9917fb97f5_.log (deflated 87%)
2025-12-04T15:43:32.5639277Z   adding: test/test-reports/test_mkl_verbose_1.1_a8ab8be9a564b785_.log (deflated 54%)
2025-12-04T15:43:32.5639976Z   adding: test/test-reports/test_utils_config_module_1.1_aa22a3cb4155f80d_.log (deflated 80%)
2025-12-04T15:43:32.5640938Z   adding: test/test-reports/test_hop_infra_1.1_f77bb32afa422f2e_.log (deflated 57%)
2025-12-04T15:43:32.5641682Z   adding: test/test-reports/test_appending_byte_serializer_1.1_7e52ee648e02aa85_.log (deflated 62%)
2025-12-04T15:43:32.5644401Z   adding: test/test-reports/test_ao_sparsity_1.1_c127cba34d71d100_.log (deflated 87%)
2025-12-04T15:43:32.5645084Z   adding: test/test-reports/test_extension_utils_1.1_7f66e708b7c7a8bc_.log (deflated 57%)
2025-12-04T15:43:32.5647509Z   adding: test/test-reports/nn.attention.test_fa4_1.1_59632c9893caec1b_.log (deflated 94%)
2025-12-04T15:43:32.5655143Z   adding: test/test-reports/typing.test_python_operators_1.1_1dbf7db937cf8b4b_.log (deflated 93%)
2025-12-04T15:43:32.5656238Z   adding: test/test-reports/torch_np.test_dtype_1.1_8ba7a24ba508317e_.log (deflated 88%)
2025-12-04T15:43:32.5656896Z   adding: test/test-reports/test_file_check_1.1_e6044214ffdb04bb_.log (deflated 53%)
2025-12-04T15:43:32.5657721Z   adding: test/test-reports/profiler.test_kineto_1.1_3901a608b259f0c8_.log (deflated 51%)
2025-12-04T15:43:32.5658780Z   adding: test/test-reports/functorch.test_ac_knapsack_1.1_a4a52ea27bf21bce_.log (deflated 78%)
2025-12-04T15:43:32.5687811Z   adding: test/test-reports/torch_np.test_nep50_examples_1.1_be93e5fc5572125c_.log (deflated 96%)
2025-12-04T15:43:32.5711258Z   adding: test/test-reports/test_torch_1.1_ed3627b67cdc077e_.log (deflated 91%)
2025-12-04T15:43:32.5711923Z   adding: test/test-reports/xpu.test_gemm_1.1_db81f0dcd896f79f_.log (deflated 48%)
2025-12-04T15:43:32.5981773Z   adding: test/test-reports/test_binary_ufuncs_1.1_d43f59e69a692663_.log (deflated 96%)
2025-12-04T15:43:32.6002483Z   adding: test/test-reports/test_modules_2.4_d8a3e6157b79afbb_.log (deflated 93%)
2025-12-04T15:43:32.6009442Z   adding: test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_3f3446ecd43fd597_.log (deflated 92%)
2025-12-04T15:43:32.6012319Z   adding: test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_bb9947961cd52757_.log (deflated 91%)
2025-12-04T15:43:32.6013190Z   adding: test/test-reports/lazy.test_debug_util_1.1_6159721dd42cd649_.log (deflated 51%)
2025-12-04T15:43:32.6013897Z   adding: test/test-reports/nn.test_load_state_dict_1.1_1f7336ad32e96ae1_.log (deflated 85%)
2025-12-04T15:43:32.6016299Z   adding: test/test-reports/test_shape_ops_1.1_17556160abffc005_.log (deflated 87%)
2025-12-04T15:43:32.6017578Z   adding: test/test-reports/profiler.test_memory_profiler_1.1_f20e3ab107ff598c_.log (deflated 82%)
2025-12-04T15:43:32.6022296Z   adding: test/test-reports/test_indexing_1.1_fbbd66d5cf2cd3ea_.log (deflated 90%)
2025-12-04T15:43:32.6022978Z   adding: test/test-reports/test_type_info_1.1_02020d4e7679db8b_.log (deflated 61%)
2025-12-04T15:43:32.6039927Z   adding: test/test-reports/functorch.test_aotdispatch_1.1_73fa05bc552fde2d_.log (deflated 91%)
2025-12-04T15:43:32.6041731Z   adding: test/test-reports/test_scatter_gather_ops_1.1_e624bed173f96ebf_.log (deflated 89%)
2025-12-04T15:43:32.6057216Z   adding: test/test-reports/test_cuda_multigpu_1.1_134114cd1fad822a_.log (deflated 85%)
2025-12-04T15:43:32.6058284Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a7d224f05328be14_.log (deflated 85%)
2025-12-04T15:43:32.6059252Z   adding: test/test-reports/test_jit_autocast_1.1_449f99b0d0d7aa89_.log (deflated 81%)
2025-12-04T15:43:32.6059966Z   adding: test/test-reports/test_xnnpack_integration_1.1_ef1a45d9c52ae3ce_.log (deflated 72%)
2025-12-04T15:43:32.6060903Z   adding: test/test-reports/nn.test_init_1.1_414026fa8e0e69bb_.log (deflated 78%)
2025-12-04T15:43:32.6061565Z   adding: test/test-reports/test_mobile_optimizer_1.1_2406b12c26273884_.log (deflated 67%)
2025-12-04T15:43:32.6062243Z   adding: test/test-reports/test_type_promotion_1.1_a64bbb5536dae6ab_.log (deflated 94%)
2025-12-04T15:43:32.6151358Z   adding: test/test-reports/test_reductions_1.1_4c27d813839f98a0_.log (deflated 96%)
2025-12-04T15:43:32.6180524Z ##[group]Run # Remove any previous debugging artifacts if they exist
2025-12-04T15:43:32.6181071Z [36;1m# Remove any previous debugging artifacts if they exist[0m
2025-12-04T15:43:32.6181482Z [36;1mrm -f debug-*.zip[0m
2025-12-04T15:43:32.6181761Z [36;1mif [ -d 'test/debug' ]; then[0m
2025-12-04T15:43:32.6182119Z [36;1m  zip -r "debug-${FILE_SUFFIX}.zip" test/debug[0m
2025-12-04T15:43:32.6182455Z [36;1mfi[0m
2025-12-04T15:43:32.6191394Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:32.6191771Z env:
2025-12-04T15:43:32.6191993Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:32.6192256Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:32.6192574Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:32.6193128Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:32.6193778Z   FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212
2025-12-04T15:43:32.6194221Z ##[endgroup]
2025-12-04T15:43:32.6287235Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T15:43:32.6287560Z with:
2025-12-04T15:43:32.6287773Z   s3-bucket: gha-artifacts
2025-12-04T15:43:32.6288094Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T15:43:32.6288484Z   retention-days: 14
2025-12-04T15:43:32.6288722Z   if-no-files-found: warn
2025-12-04T15:43:32.6288990Z   path: test-jsons-*.zip
2025-12-04T15:43:32.6289246Z   name: artifact
2025-12-04T15:43:32.6289455Z   region: us-east-1
2025-12-04T15:43:32.6289671Z env:
2025-12-04T15:43:32.6289874Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:32.6290134Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:32.6290448Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:32.6291005Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:32.6291493Z ##[endgroup]
2025-12-04T15:43:32.9746884Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T15:43:32.9747344Z With the provided path, there will be 1 file uploaded
2025-12-04T15:43:32.9747998Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T15:43:32.9820820Z Starting upload of test-jsons-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip
2025-12-04T15:43:33.1822410Z Finished upload of test-jsons-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip
2025-12-04T15:43:33.2120624Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T15:43:33.2120942Z with:
2025-12-04T15:43:33.2121155Z   s3-bucket: gha-artifacts
2025-12-04T15:43:33.2121478Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T15:43:33.2121824Z   retention-days: 14
2025-12-04T15:43:33.2122072Z   if-no-files-found: error
2025-12-04T15:43:33.2122334Z   path: test-reports-*.zip
2025-12-04T15:43:33.2122580Z   name: artifact
2025-12-04T15:43:33.2122794Z   region: us-east-1
2025-12-04T15:43:33.2123003Z env:
2025-12-04T15:43:33.2123202Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:33.2123454Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:33.2123767Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:33.2124324Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:33.2124824Z ##[endgroup]
2025-12-04T15:43:33.5754507Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T15:43:33.5754939Z With the provided path, there will be 1 file uploaded
2025-12-04T15:43:33.5755392Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T15:43:33.5828900Z Starting upload of test-reports-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip
2025-12-04T15:43:33.7656446Z Finished upload of test-reports-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip
2025-12-04T15:43:33.7966297Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T15:43:33.7966618Z with:
2025-12-04T15:43:33.7966819Z   s3-bucket: gha-artifacts
2025-12-04T15:43:33.7967125Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T15:43:33.7967470Z   retention-days: 14
2025-12-04T15:43:33.7967709Z   if-no-files-found: ignore
2025-12-04T15:43:33.7967980Z   path: logs-*.zip
2025-12-04T15:43:33.7968203Z   name: artifact
2025-12-04T15:43:33.7968421Z   region: us-east-1
2025-12-04T15:43:33.7968665Z env:
2025-12-04T15:43:33.7968890Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:33.7969138Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:33.7969444Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:33.7970008Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:33.7970506Z ##[endgroup]
2025-12-04T15:43:34.1281653Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T15:43:34.1282187Z With the provided path, there will be 1 file uploaded
2025-12-04T15:43:34.1282632Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T15:43:34.1355773Z Starting upload of logs-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip
2025-12-04T15:43:34.3096002Z Finished upload of logs-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip
2025-12-04T15:43:34.3394752Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T15:43:34.3395086Z with:
2025-12-04T15:43:34.3395285Z   s3-bucket: gha-artifacts
2025-12-04T15:43:34.3395593Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T15:43:34.3395934Z   retention-days: 14
2025-12-04T15:43:34.3396175Z   if-no-files-found: ignore
2025-12-04T15:43:34.3396447Z   path: debug-*.zip
2025-12-04T15:43:34.3396666Z   name: artifact
2025-12-04T15:43:34.3396875Z   region: us-east-1
2025-12-04T15:43:34.3397089Z env:
2025-12-04T15:43:34.3397298Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:34.3397544Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:34.3397860Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:34.3398419Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:34.3398959Z ##[endgroup]
2025-12-04T15:43:34.6657180Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded.
2025-12-04T15:43:34.6952115Z ##[group]Run # shellcheck disable=SC2156
2025-12-04T15:43:34.6952490Z [36;1m# shellcheck disable=SC2156[0m
2025-12-04T15:43:34.6953075Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2025-12-04T15:43:34.6962475Z shell: /usr/bin/bash -e {0}
2025-12-04T15:43:34.6962847Z env:
2025-12-04T15:43:34.6963047Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:34.6963308Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:34.6963608Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:34.6964174Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:34.6964678Z ##[endgroup]
2025-12-04T15:43:35.1051562Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a
2025-12-04T15:43:35.1052051Z with:
2025-12-04T15:43:35.1052344Z   name: coredumps-default-2-8-linux.g5.4xlarge.nvidia.gpu
2025-12-04T15:43:35.1052725Z   retention-days: 14
2025-12-04T15:43:35.1052983Z   if-no-files-found: ignore
2025-12-04T15:43:35.1053238Z   path: ./**/core.[1-9]*
2025-12-04T15:43:35.1053489Z   s3-bucket: gha-artifacts
2025-12-04T15:43:35.1053743Z   region: us-east-1
2025-12-04T15:43:35.1053943Z env:
2025-12-04T15:43:35.1054140Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:35.1054392Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:35.1054696Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:35.1055245Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:35.1055739Z ##[endgroup]
2025-12-04T15:43:49.6865296Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded.
2025-12-04T15:43:49.7315272Z Prepare all required actions
2025-12-04T15:43:49.7315630Z Getting action download info
2025-12-04T15:43:49.8903834Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548)
2025-12-04T15:43:50.2979921Z ##[group]Run ./.github/actions/upload-utilization-stats
2025-12-04T15:43:50.2980290Z with:
2025-12-04T15:43:50.2980486Z   job_id: 57118183212
2025-12-04T15:43:50.2981165Z   job_name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check)
2025-12-04T15:43:50.2981911Z   workflow_name: periodic
2025-12-04T15:43:50.2982175Z   workflow_run_id: 19922826259
2025-12-04T15:43:50.2982441Z   workflow_attempt: 1
2025-12-04T15:43:50.2982662Z env:
2025-12-04T15:43:50.2982861Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:50.2983109Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:50.2983414Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:50.2983997Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:50.2984493Z ##[endgroup]
2025-12-04T15:43:50.3040175Z ##[group]Run actions/setup-python@v6
2025-12-04T15:43:50.3040451Z with:
2025-12-04T15:43:50.3040646Z   python-version: 3.10
2025-12-04T15:43:50.3040883Z   check-latest: false
2025-12-04T15:43:50.3041221Z   token: ***
2025-12-04T15:43:50.3041436Z   update-environment: true
2025-12-04T15:43:50.3041698Z   allow-prereleases: false
2025-12-04T15:43:50.3041953Z   freethreaded: false
2025-12-04T15:43:50.3042182Z env:
2025-12-04T15:43:50.3042375Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:50.3042610Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:50.3042903Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:50.3043453Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:50.3043942Z ##[endgroup]
2025-12-04T15:43:50.6599406Z ##[group]Installed versions
2025-12-04T15:43:50.6608558Z Version 3.10 was not found in the local cache
2025-12-04T15:43:50.6824310Z (node:242511) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
2025-12-04T15:43:50.6825096Z (Use `node --trace-deprecation ...` to show where the warning was created)
2025-12-04T15:43:51.1411054Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system.
The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
2025-12-04T15:43:51.1625618Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-12-04T15:43:51.1626166Z with:
2025-12-04T15:43:51.1626365Z env:
2025-12-04T15:43:51.1626571Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:51.1626834Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:51.1627146Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:51.1627706Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:51.1628217Z ##[endgroup]
2025-12-04T15:43:51.1644421Z ##[group]Run set -eou pipefail
2025-12-04T15:43:51.1644735Z [36;1mset -eou pipefail[0m
2025-12-04T15:43:51.1657160Z [36;1m[0m
2025-12-04T15:43:51.1657535Z [36;1mecho "Holding runner for 2 hours until all ssh sessions have logged out"[0m
2025-12-04T15:43:51.1658000Z [36;1mfor _ in $(seq 1440); do[0m
2025-12-04T15:43:51.1658331Z [36;1m    # Break if no ssh session exists anymore[0m
2025-12-04T15:43:51.1658674Z [36;1m    if [ "$(who)" = "" ]; then[0m
2025-12-04T15:43:51.1658995Z [36;1m      break[0m
2025-12-04T15:43:51.1659287Z [36;1m    fi[0m
2025-12-04T15:43:51.1659499Z [36;1m    echo "."[0m
2025-12-04T15:43:51.1659738Z [36;1m    sleep 5[0m
2025-12-04T15:43:51.1659967Z [36;1mdone[0m
2025-12-04T15:43:51.1669007Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:51.1669373Z env:
2025-12-04T15:43:51.1669579Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:51.1669830Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:51.1670138Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:51.1670708Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:51.1671202Z ##[endgroup]
2025-12-04T15:43:51.1702667Z Holding runner for 2 hours until all ssh sessions have logged out
2025-12-04T15:43:51.1796705Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T15:43:51.1797504Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T15:43:51.1798085Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T15:43:51.1798461Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T15:43:51.1798809Z [36;1m# Prune all of the docker images[0m
2025-12-04T15:43:51.1799121Z [36;1mdocker system prune -af[0m
2025-12-04T15:43:51.1808767Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:43:51.1809141Z env:
2025-12-04T15:43:51.1809354Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:43:51.1809603Z   HAS_NVIDIA_GPU: true
2025-12-04T15:43:51.1809907Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:43:51.1810459Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:43:51.1810953Z ##[endgroup]
2025-12-04T15:44:02.6322008Z 5d0babf71ea3
2025-12-04T15:44:07.5055002Z Deleted Containers:
2025-12-04T15:44:07.5055493Z 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:44:07.5055831Z 
2025-12-04T15:44:20.2146344Z Deleted Images:
2025-12-04T15:44:20.2147529Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T15:44:20.2149090Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97
2025-12-04T15:44:20.2150025Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301
2025-12-04T15:44:20.2150675Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301
2025-12-04T15:44:20.2151333Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4
2025-12-04T15:44:20.2151970Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b
2025-12-04T15:44:20.2152861Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124
2025-12-04T15:44:20.2153505Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca
2025-12-04T15:44:20.2154394Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b
2025-12-04T15:44:20.2155044Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d
2025-12-04T15:44:20.2155884Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a
2025-12-04T15:44:20.2156520Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2
2025-12-04T15:44:20.2157258Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e
2025-12-04T15:44:20.2157953Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40
2025-12-04T15:44:20.2158577Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31
2025-12-04T15:44:20.2159223Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca
2025-12-04T15:44:20.2159875Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1
2025-12-04T15:44:20.2160505Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c
2025-12-04T15:44:20.2161154Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c
2025-12-04T15:44:20.2161811Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7
2025-12-04T15:44:20.2162460Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b
2025-12-04T15:44:20.2163107Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce
2025-12-04T15:44:20.2163759Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c
2025-12-04T15:44:20.2164520Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452
2025-12-04T15:44:20.2165162Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f
2025-12-04T15:44:20.2165801Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4
2025-12-04T15:44:20.2166473Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13
2025-12-04T15:44:20.2167207Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877
2025-12-04T15:44:20.2167846Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751
2025-12-04T15:44:20.2168492Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b
2025-12-04T15:44:20.2169127Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447
2025-12-04T15:44:20.2169841Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d
2025-12-04T15:44:20.2170572Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d
2025-12-04T15:44:20.2171222Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882
2025-12-04T15:44:20.2171866Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f
2025-12-04T15:44:20.2172496Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d
2025-12-04T15:44:20.2173132Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d
2025-12-04T15:44:20.2173768Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33
2025-12-04T15:44:20.2174405Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae
2025-12-04T15:44:20.2175045Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e
2025-12-04T15:44:20.2175699Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc
2025-12-04T15:44:20.2176506Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2
2025-12-04T15:44:20.2177433Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc
2025-12-04T15:44:20.2178285Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300
2025-12-04T15:44:20.2179037Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a
2025-12-04T15:44:20.2179905Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a
2025-12-04T15:44:20.2180540Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19
2025-12-04T15:44:20.2181221Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd
2025-12-04T15:44:20.2181870Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b
2025-12-04T15:44:20.2182543Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33
2025-12-04T15:44:20.2183193Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f
2025-12-04T15:44:20.2183846Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1
2025-12-04T15:44:20.2184489Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e
2025-12-04T15:44:20.2185142Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c
2025-12-04T15:44:20.2185797Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89
2025-12-04T15:44:20.2186450Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c
2025-12-04T15:44:20.2187109Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de
2025-12-04T15:44:20.2187839Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d
2025-12-04T15:44:20.2188473Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed
2025-12-04T15:44:20.2189113Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f
2025-12-04T15:44:20.2189758Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5
2025-12-04T15:44:20.2190385Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3
2025-12-04T15:44:20.2191033Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883
2025-12-04T15:44:20.2191682Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15
2025-12-04T15:44:20.2192326Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c
2025-12-04T15:44:20.2192986Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33
2025-12-04T15:44:20.2193645Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3
2025-12-04T15:44:20.2194300Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c
2025-12-04T15:44:20.2194940Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255
2025-12-04T15:44:20.2195568Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292
2025-12-04T15:44:20.2196212Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844
2025-12-04T15:44:20.2196883Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2
2025-12-04T15:44:20.2197702Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd
2025-12-04T15:44:20.2198514Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1
2025-12-04T15:44:20.2199234Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff
2025-12-04T15:44:20.2199877Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315
2025-12-04T15:44:20.2200516Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b
2025-12-04T15:44:20.2201275Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450
2025-12-04T15:44:20.2202323Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1
2025-12-04T15:44:20.2203134Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5
2025-12-04T15:44:20.2203933Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1
2025-12-04T15:44:20.2204837Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024
2025-12-04T15:44:20.2205739Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e
2025-12-04T15:44:20.2206538Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032
2025-12-04T15:44:20.2207401Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe
2025-12-04T15:44:20.2208503Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c
2025-12-04T15:44:20.2209297Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b
2025-12-04T15:44:20.2209967Z untagged: public.ecr.aws/docker/library/python:3.13
2025-12-04T15:44:20.2210875Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T15:44:20.2211934Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a
2025-12-04T15:44:20.2212762Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd
2025-12-04T15:44:20.2213574Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67
2025-12-04T15:44:20.2214385Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c
2025-12-04T15:44:20.2215217Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212
2025-12-04T15:44:20.2216035Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318
2025-12-04T15:44:20.2216840Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560
2025-12-04T15:44:20.2217670Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee
2025-12-04T15:44:20.2218159Z 
2025-12-04T15:44:20.2218308Z Total reclaimed space: 38.02GB
2025-12-04T15:44:20.2264029Z ##[group]Run set +e
2025-12-04T15:44:20.2264339Z [36;1mset +e[0m
2025-12-04T15:44:20.2264552Z [36;1mset -x[0m
2025-12-04T15:44:20.2264765Z [36;1m[0m
2025-12-04T15:44:20.2264963Z [36;1mnvidia-smi[0m
2025-12-04T15:44:20.2265411Z [36;1m# NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in[0m
2025-12-04T15:44:20.2266105Z [36;1m# the case where the driver has already crashed as it still can get the driver version[0m
2025-12-04T15:44:20.2266783Z [36;1m# and some basic information like the bus ID.  However, the rest of the information[0m
2025-12-04T15:44:20.2267294Z [36;1m# would be missing (ERR!), for example:[0m
2025-12-04T15:44:20.2267606Z [36;1m#[0m
2025-12-04T15:44:20.2267897Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T15:44:20.2268425Z [36;1m# | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |[0m
2025-12-04T15:44:20.2268964Z [36;1m# |-------------------------------+----------------------+----------------------+[0m
2025-12-04T15:44:20.2269488Z [36;1m# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |[0m
2025-12-04T15:44:20.2270074Z [36;1m# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |[0m
2025-12-04T15:44:20.2270551Z [36;1m# |                               |                      |               MIG M. |[0m
2025-12-04T15:44:20.2270909Z [36;1m# |===============================+======================+======================|[0m
2025-12-04T15:44:20.2271320Z [36;1m# |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |[0m
2025-12-04T15:44:20.2271794Z [36;1m# |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |[0m
2025-12-04T15:44:20.2272248Z [36;1m# |                               |                      |                 ERR! |[0m
2025-12-04T15:44:20.2272689Z [36;1m# +-------------------------------+----------------------+----------------------+[0m
2025-12-04T15:44:20.2273064Z [36;1m#[0m
2025-12-04T15:44:20.2273353Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T15:44:20.2273903Z [36;1m# | Processes:                                                                  |[0m
2025-12-04T15:44:20.2274367Z [36;1m# |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |[0m
2025-12-04T15:44:20.2274818Z [36;1m# |        ID   ID                                                   Usage      |[0m
2025-12-04T15:44:20.2275263Z [36;1m# |=============================================================================|[0m
2025-12-04T15:44:20.2275681Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T15:44:20.2276050Z [36;1m#[0m
2025-12-04T15:44:20.2276434Z [36;1m# This should be reported as a failure instead as it will guarantee to fail when[0m
2025-12-04T15:44:20.2276944Z [36;1m# Docker tries to run with --gpus all[0m
2025-12-04T15:44:20.2277258Z [36;1m#[0m
2025-12-04T15:44:20.2277616Z [36;1m# So, the correct check here is to query one of the missing piece of info like[0m
2025-12-04T15:44:20.2278145Z [36;1m# GPU name, so that the command can fail accordingly[0m
2025-12-04T15:44:20.2278639Z [36;1mnvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0[0m
2025-12-04T15:44:20.2279060Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T15:44:20.2279325Z [36;1m[0m
2025-12-04T15:44:20.2279772Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T15:44:20.2280437Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T15:44:20.2281030Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T15:44:20.2281542Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T15:44:20.2281867Z [36;1mfi[0m
2025-12-04T15:44:20.2282063Z [36;1m[0m
2025-12-04T15:44:20.2282583Z [36;1m# For runner with multiple GPUs, we also want to confirm that the number of GPUs are the[0m
2025-12-04T15:44:20.2283211Z [36;1m# power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails[0m
2025-12-04T15:44:20.2283741Z [36;1m# https://github.com/pytorch/test-infra/issues/4000[0m
2025-12-04T15:44:20.2284171Z [36;1mGPU_COUNT=$(nvidia-smi --list-gpus | wc -l)[0m
2025-12-04T15:44:20.2284526Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T15:44:20.2284795Z [36;1m[0m
2025-12-04T15:44:20.2285214Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T15:44:20.2285855Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T15:44:20.2286436Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T15:44:20.2286939Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T15:44:20.2287248Z [36;1mfi[0m
2025-12-04T15:44:20.2287449Z [36;1m[0m
2025-12-04T15:44:20.2287689Z [36;1m# Check the GPU count to be a power of 2[0m
2025-12-04T15:44:20.2288240Z [36;1mif [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then[0m
2025-12-04T15:44:20.2288985Z [36;1m  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..."[0m
2025-12-04T15:44:20.2289559Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T15:44:20.2289882Z [36;1mfi[0m
2025-12-04T15:44:20.2300596Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:44:20.2300962Z env:
2025-12-04T15:44:20.2301162Z   GIT_DEFAULT_BRANCH: main
2025-12-04T15:44:20.2301410Z   HAS_NVIDIA_GPU: true
2025-12-04T15:44:20.2301713Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T15:44:20.2302314Z   DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7
2025-12-04T15:44:20.2302816Z ##[endgroup]
2025-12-04T15:44:20.2340785Z + nvidia-smi
2025-12-04T15:44:20.2574496Z Thu Dec  4 15:44:20 2025       
2025-12-04T15:44:20.2575023Z +-----------------------------------------------------------------------------------------+
2025-12-04T15:44:20.2575824Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T15:44:20.2576344Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T15:44:20.2576917Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T15:44:20.2577587Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T15:44:20.2578040Z |                                         |                        |               MIG M. |
2025-12-04T15:44:20.2578398Z |=========================================+========================+======================|
2025-12-04T15:44:20.2791097Z |   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
2025-12-04T15:44:20.2791730Z |  0%   21C    P8             10W /  300W |       0MiB /  23028MiB |      0%      Default |
2025-12-04T15:44:20.2792232Z |                                         |                        |                  N/A |
2025-12-04T15:44:20.2792706Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T15:44:20.2796191Z 
2025-12-04T15:44:20.2796666Z +-----------------------------------------------------------------------------------------+
2025-12-04T15:44:20.2797286Z | Processes:                                                                              |
2025-12-04T15:44:20.2797887Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T15:44:20.2798377Z |        ID   ID                                                               Usage      |
2025-12-04T15:44:20.2799008Z |=========================================================================================|
2025-12-04T15:44:20.2803398Z |  No running processes found                                                             |
2025-12-04T15:44:20.2803935Z +-----------------------------------------------------------------------------------------+
2025-12-04T15:44:20.5367765Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T15:44:20.5542178Z NVIDIA A10G
2025-12-04T15:44:20.5587818Z + NVIDIA_SMI_STATUS=0
2025-12-04T15:44:20.5588150Z + '[' 0 -ne 0 ']'
2025-12-04T15:44:20.5594968Z ++ nvidia-smi --list-gpus
2025-12-04T15:44:20.5595740Z ++ wc -l
2025-12-04T15:44:20.5820868Z + GPU_COUNT=1
2025-12-04T15:44:20.5821221Z + NVIDIA_SMI_STATUS=0
2025-12-04T15:44:20.5821528Z + '[' 0 -ne 0 ']'
2025-12-04T15:44:20.5821745Z + '[' 1 -le 8 ']'
2025-12-04T15:44:20.5821959Z + '[' 1 -ne 1 ']'
2025-12-04T15:44:20.5889994Z Post job cleanup.
2025-12-04T15:44:20.5966086Z Post job cleanup.
2025-12-04T15:44:20.6010946Z Post job cleanup.
2025-12-04T15:44:20.7058142Z [command]/usr/bin/git version
2025-12-04T15:44:20.7127338Z git version 2.50.1
2025-12-04T15:44:20.7164989Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/11dd59f0-aa5a-483e-a1a9-e62eb03c751e/.gitconfig'
2025-12-04T15:44:20.7173900Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/11dd59f0-aa5a-483e-a1a9-e62eb03c751e' before making global git config changes
2025-12-04T15:44:20.7174873Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T15:44:20.7179352Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T15:44:20.7227686Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T15:44:20.7275423Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T15:44:20.7690309Z Entering 'android/libs/fbjni'
2025-12-04T15:44:20.7773061Z Entering 'third_party/FP16'
2025-12-04T15:44:20.7854134Z Entering 'third_party/FXdiv'
2025-12-04T15:44:20.7946185Z Entering 'third_party/NNPACK'
2025-12-04T15:44:20.8030932Z Entering 'third_party/NVTX'
2025-12-04T15:44:20.8113215Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T15:44:20.8195100Z Entering 'third_party/XNNPACK'
2025-12-04T15:44:20.8294452Z Entering 'third_party/aiter'
2025-12-04T15:44:20.8376453Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T15:44:20.8466640Z Entering 'third_party/benchmark'
2025-12-04T15:44:20.8548654Z Entering 'third_party/composable_kernel'
2025-12-04T15:44:20.8639903Z Entering 'third_party/cpp-httplib'
2025-12-04T15:44:20.8720808Z Entering 'third_party/cpuinfo'
2025-12-04T15:44:20.8801595Z Entering 'third_party/cudnn_frontend'
2025-12-04T15:44:20.8882784Z Entering 'third_party/cutlass'
2025-12-04T15:44:20.8977651Z Entering 'third_party/fbgemm'
2025-12-04T15:44:20.9060980Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T15:44:20.9138383Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T15:44:20.9224644Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T15:44:20.9301201Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T15:44:20.9388901Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T15:44:20.9473258Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T15:44:20.9544649Z Entering 'third_party/fbgemm/external/json'
2025-12-04T15:44:20.9632900Z Entering 'third_party/flash-attention'
2025-12-04T15:44:20.9713155Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T15:44:20.9793572Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T15:44:20.9886504Z Entering 'third_party/flatbuffers'
2025-12-04T15:44:20.9971021Z Entering 'third_party/fmt'
2025-12-04T15:44:21.0051555Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T15:44:21.0132681Z Entering 'third_party/gloo'
2025-12-04T15:44:21.0213045Z Entering 'third_party/googletest'
2025-12-04T15:44:21.0292463Z Entering 'third_party/ideep'
2025-12-04T15:44:21.0371483Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T15:44:21.0458439Z Entering 'third_party/ittapi'
2025-12-04T15:44:21.0541359Z Entering 'third_party/kineto'
2025-12-04T15:44:21.0624878Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T15:44:21.0700094Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T15:44:21.0786978Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T15:44:21.0865870Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T15:44:21.0944262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T15:44:21.1024715Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T15:44:21.1106925Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T15:44:21.1185413Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T15:44:21.1266019Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T15:44:21.1345495Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T15:44:21.1422779Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T15:44:21.1501357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T15:44:21.1583382Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T15:44:21.1670288Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T15:44:21.1751210Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T15:44:21.1834283Z Entering 'third_party/kleidiai'
2025-12-04T15:44:21.1913858Z Entering 'third_party/mimalloc'
2025-12-04T15:44:21.1992008Z Entering 'third_party/nlohmann'
2025-12-04T15:44:21.2072621Z Entering 'third_party/onnx'
2025-12-04T15:44:21.2171176Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T15:44:21.2256324Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T15:44:21.2339521Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T15:44:21.2423309Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T15:44:21.2500133Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T15:44:21.2580934Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T15:44:21.2660997Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T15:44:21.2738166Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T15:44:21.2815395Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T15:44:21.2890811Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T15:44:21.2970720Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T15:44:21.3055809Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T15:44:21.3157378Z Entering 'third_party/pocketfft'
2025-12-04T15:44:21.3236948Z Entering 'third_party/protobuf'
2025-12-04T15:44:21.3318284Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T15:44:21.3394868Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T15:44:21.3479178Z Entering 'third_party/psimd'
2025-12-04T15:44:21.3557478Z Entering 'third_party/pthreadpool'
2025-12-04T15:44:21.3635326Z Entering 'third_party/pybind11'
2025-12-04T15:44:21.3714176Z Entering 'third_party/python-peachpy'
2025-12-04T15:44:21.3794184Z Entering 'third_party/sleef'
2025-12-04T15:44:21.3872973Z Entering 'third_party/tensorpipe'
2025-12-04T15:44:21.3953581Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T15:44:21.4032468Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T15:44:21.4108867Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T15:44:21.4186483Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T15:44:21.4261863Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T15:44:21.4374500Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T15:44:21.4403537Z http.https://github.com/.extraheader
2025-12-04T15:44:21.4421388Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T15:44:21.4465837Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T15:44:21.4866257Z Entering 'android/libs/fbjni'
2025-12-04T15:44:21.4919279Z http.https://github.com/.extraheader
2025-12-04T15:44:21.4968669Z Entering 'third_party/FP16'
2025-12-04T15:44:21.5021973Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5071744Z Entering 'third_party/FXdiv'
2025-12-04T15:44:21.5126349Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5175274Z Entering 'third_party/NNPACK'
2025-12-04T15:44:21.5229629Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5279825Z Entering 'third_party/NVTX'
2025-12-04T15:44:21.5332422Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5383478Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T15:44:21.5439733Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5488808Z Entering 'third_party/XNNPACK'
2025-12-04T15:44:21.5541486Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5611877Z Entering 'third_party/aiter'
2025-12-04T15:44:21.5663473Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5714216Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T15:44:21.5765148Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5829765Z Entering 'third_party/benchmark'
2025-12-04T15:44:21.5881317Z http.https://github.com/.extraheader
2025-12-04T15:44:21.5932646Z Entering 'third_party/composable_kernel'
2025-12-04T15:44:21.5984659Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6044026Z Entering 'third_party/cpp-httplib'
2025-12-04T15:44:21.6099234Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6150083Z Entering 'third_party/cpuinfo'
2025-12-04T15:44:21.6206427Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6257544Z Entering 'third_party/cudnn_frontend'
2025-12-04T15:44:21.6314051Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6365223Z Entering 'third_party/cutlass'
2025-12-04T15:44:21.6416368Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6475465Z Entering 'third_party/fbgemm'
2025-12-04T15:44:21.6528267Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6580987Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T15:44:21.6628983Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6678190Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T15:44:21.6728689Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6786669Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T15:44:21.6839066Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6889342Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T15:44:21.6940014Z http.https://github.com/.extraheader
2025-12-04T15:44:21.6998731Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T15:44:21.7048210Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7097541Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T15:44:21.7149165Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7197227Z Entering 'third_party/fbgemm/external/json'
2025-12-04T15:44:21.7249384Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7304327Z Entering 'third_party/flash-attention'
2025-12-04T15:44:21.7356868Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7406425Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T15:44:21.7462853Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7519201Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T15:44:21.7569308Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7629783Z Entering 'third_party/flatbuffers'
2025-12-04T15:44:21.7683100Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7734595Z Entering 'third_party/fmt'
2025-12-04T15:44:21.7786906Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7837660Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T15:44:21.7889657Z http.https://github.com/.extraheader
2025-12-04T15:44:21.7941178Z Entering 'third_party/gloo'
2025-12-04T15:44:21.7992784Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8043600Z Entering 'third_party/googletest'
2025-12-04T15:44:21.8094186Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8144739Z Entering 'third_party/ideep'
2025-12-04T15:44:21.8196510Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8246674Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T15:44:21.8303224Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8362831Z Entering 'third_party/ittapi'
2025-12-04T15:44:21.8416286Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8467404Z Entering 'third_party/kineto'
2025-12-04T15:44:21.8523839Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8572484Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T15:44:21.8627908Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8676199Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T15:44:21.8727804Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8779409Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T15:44:21.8830109Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8881082Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T15:44:21.8931358Z http.https://github.com/.extraheader
2025-12-04T15:44:21.8981988Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T15:44:21.9033254Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9080154Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T15:44:21.9132211Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9188219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T15:44:21.9238995Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9289392Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T15:44:21.9341384Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9395609Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T15:44:21.9446166Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9497649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T15:44:21.9551195Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9601974Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T15:44:21.9652196Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9702322Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T15:44:21.9753557Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9806473Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T15:44:21.9858402Z http.https://github.com/.extraheader
2025-12-04T15:44:21.9916905Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T15:44:21.9967358Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0016604Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T15:44:22.0066471Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0121347Z Entering 'third_party/kleidiai'
2025-12-04T15:44:22.0173600Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0228262Z Entering 'third_party/mimalloc'
2025-12-04T15:44:22.0280490Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0334435Z Entering 'third_party/nlohmann'
2025-12-04T15:44:22.0385743Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0439942Z Entering 'third_party/onnx'
2025-12-04T15:44:22.0491241Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0557897Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T15:44:22.0609016Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0663585Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T15:44:22.0714923Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0765903Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T15:44:22.0815853Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0866432Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T15:44:22.0913712Z http.https://github.com/.extraheader
2025-12-04T15:44:22.0963584Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T15:44:22.1018662Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1066318Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T15:44:22.1115514Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1165711Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T15:44:22.1214552Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1263220Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T15:44:22.1317379Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1366027Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T15:44:22.1415382Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1463324Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T15:44:22.1513922Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1565413Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T15:44:22.1615747Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1669358Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T15:44:22.1718040Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1796964Z Entering 'third_party/pocketfft'
2025-12-04T15:44:22.1849018Z http.https://github.com/.extraheader
2025-12-04T15:44:22.1898610Z Entering 'third_party/protobuf'
2025-12-04T15:44:22.1952032Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2007030Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T15:44:22.2062761Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2112835Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T15:44:22.2163371Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2217657Z Entering 'third_party/psimd'
2025-12-04T15:44:22.2268914Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2320349Z Entering 'third_party/pthreadpool'
2025-12-04T15:44:22.2373094Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2423961Z Entering 'third_party/pybind11'
2025-12-04T15:44:22.2475223Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2525352Z Entering 'third_party/python-peachpy'
2025-12-04T15:44:22.2576330Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2629999Z Entering 'third_party/sleef'
2025-12-04T15:44:22.2680455Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2730576Z Entering 'third_party/tensorpipe'
2025-12-04T15:44:22.2781742Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2831370Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T15:44:22.2879249Z http.https://github.com/.extraheader
2025-12-04T15:44:22.2930978Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T15:44:22.2985265Z http.https://github.com/.extraheader
2025-12-04T15:44:22.3033414Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T15:44:22.3082755Z http.https://github.com/.extraheader
2025-12-04T15:44:22.3133853Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T15:44:22.3183551Z http.https://github.com/.extraheader
2025-12-04T15:44:22.3232392Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T15:44:22.3283562Z http.https://github.com/.extraheader
2025-12-04T15:44:22.3370502Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.3420971Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T15:44:22.3827623Z Entering 'android/libs/fbjni'
2025-12-04T15:44:22.3863426Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T15:44:22.3887938Z Entering 'third_party/FP16'
2025-12-04T15:44:22.3923556Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T15:44:22.3948661Z Entering 'third_party/FXdiv'
2025-12-04T15:44:22.3984155Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T15:44:22.4009440Z Entering 'third_party/NNPACK'
2025-12-04T15:44:22.4043987Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T15:44:22.4070925Z Entering 'third_party/NVTX'
2025-12-04T15:44:22.4107004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T15:44:22.4134668Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T15:44:22.4169827Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T15:44:22.4194936Z Entering 'third_party/XNNPACK'
2025-12-04T15:44:22.4230198Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T15:44:22.4271141Z Entering 'third_party/aiter'
2025-12-04T15:44:22.4306167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T15:44:22.4333550Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T15:44:22.4367023Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T15:44:22.4401275Z Entering 'third_party/benchmark'
2025-12-04T15:44:22.4436634Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T15:44:22.4461964Z Entering 'third_party/composable_kernel'
2025-12-04T15:44:22.4496812Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T15:44:22.4530737Z Entering 'third_party/cpp-httplib'
2025-12-04T15:44:22.4567090Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T15:44:22.4592293Z Entering 'third_party/cpuinfo'
2025-12-04T15:44:22.4627648Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T15:44:22.4653822Z Entering 'third_party/cudnn_frontend'
2025-12-04T15:44:22.4688783Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T15:44:22.4714876Z Entering 'third_party/cutlass'
2025-12-04T15:44:22.4749938Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T15:44:22.4784199Z Entering 'third_party/fbgemm'
2025-12-04T15:44:22.4819718Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T15:44:22.4847395Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T15:44:22.4880004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T15:44:22.4905432Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T15:44:22.4938389Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T15:44:22.4971392Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T15:44:22.5004544Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T15:44:22.5029508Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T15:44:22.5062003Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T15:44:22.5095903Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T15:44:22.5128136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T15:44:22.5151609Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T15:44:22.5187600Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T15:44:22.5210556Z Entering 'third_party/fbgemm/external/json'
2025-12-04T15:44:22.5243889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T15:44:22.5271172Z Entering 'third_party/flash-attention'
2025-12-04T15:44:22.5306398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T15:44:22.5331654Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T15:44:22.5363147Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T15:44:22.5393717Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T15:44:22.5426952Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T15:44:22.5463584Z Entering 'third_party/flatbuffers'
2025-12-04T15:44:22.5498725Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T15:44:22.5527490Z Entering 'third_party/fmt'
2025-12-04T15:44:22.5562392Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T15:44:22.5587884Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T15:44:22.5624956Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T15:44:22.5650920Z Entering 'third_party/gloo'
2025-12-04T15:44:22.5686592Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T15:44:22.5712390Z Entering 'third_party/googletest'
2025-12-04T15:44:22.5747106Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T15:44:22.5772445Z Entering 'third_party/ideep'
2025-12-04T15:44:22.5809550Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T15:44:22.5831544Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T15:44:22.5867215Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T15:44:22.5901278Z Entering 'third_party/ittapi'
2025-12-04T15:44:22.5937474Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T15:44:22.5962809Z Entering 'third_party/kineto'
2025-12-04T15:44:22.5997466Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T15:44:22.6021378Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T15:44:22.6055496Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T15:44:22.6078504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T15:44:22.6112349Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T15:44:22.6139545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T15:44:22.6173864Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T15:44:22.6198565Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T15:44:22.6232507Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T15:44:22.6257005Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T15:44:22.6290258Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T15:44:22.6312997Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T15:44:22.6347486Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T15:44:22.6375944Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T15:44:22.6409882Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T15:44:22.6433522Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T15:44:22.6466361Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T15:44:22.6491369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T15:44:22.6530956Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T15:44:22.6556505Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T15:44:22.6589859Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T15:44:22.6614677Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T15:44:22.6650338Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T15:44:22.6680134Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T15:44:22.6706824Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T15:44:22.6734391Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T15:44:22.6768362Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T15:44:22.6799879Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T15:44:22.6833122Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T15:44:22.6856907Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T15:44:22.6891776Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T15:44:22.6919439Z Entering 'third_party/kleidiai'
2025-12-04T15:44:22.6955410Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T15:44:22.6980874Z Entering 'third_party/mimalloc'
2025-12-04T15:44:22.7022014Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T15:44:22.7047187Z Entering 'third_party/nlohmann'
2025-12-04T15:44:22.7082134Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T15:44:22.7109063Z Entering 'third_party/onnx'
2025-12-04T15:44:22.7148486Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T15:44:22.7191814Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T15:44:22.7226679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T15:44:22.7259411Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T15:44:22.7297102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T15:44:22.7322016Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T15:44:22.7353889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T15:44:22.7378115Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T15:44:22.7412749Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T15:44:22.7439126Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T15:44:22.7469646Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T15:44:22.7493990Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T15:44:22.7527720Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T15:44:22.7553482Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T15:44:22.7588234Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T15:44:22.7611456Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T15:44:22.7646431Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T15:44:22.7668340Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T15:44:22.7701351Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T15:44:22.7725118Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T15:44:22.7759271Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T15:44:22.7785319Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T15:44:22.7820219Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T15:44:22.7846911Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T15:44:22.7880145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T15:44:22.7927987Z Entering 'third_party/pocketfft'
2025-12-04T15:44:22.7963473Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T15:44:22.7989916Z Entering 'third_party/protobuf'
2025-12-04T15:44:22.8025566Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T15:44:22.8051988Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T15:44:22.8084507Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T15:44:22.8110771Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T15:44:22.8144906Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T15:44:22.8173188Z Entering 'third_party/psimd'
2025-12-04T15:44:22.8209637Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T15:44:22.8236385Z Entering 'third_party/pthreadpool'
2025-12-04T15:44:22.8273618Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T15:44:22.8298736Z Entering 'third_party/pybind11'
2025-12-04T15:44:22.8334491Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T15:44:22.8360408Z Entering 'third_party/python-peachpy'
2025-12-04T15:44:22.8395640Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T15:44:22.8421658Z Entering 'third_party/sleef'
2025-12-04T15:44:22.8456901Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T15:44:22.8482707Z Entering 'third_party/tensorpipe'
2025-12-04T15:44:22.8518870Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T15:44:22.8543783Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T15:44:22.8575831Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T15:44:22.8600490Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T15:44:22.8634303Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T15:44:22.8657796Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T15:44:22.8690047Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T15:44:22.8714407Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T15:44:22.8746796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T15:44:22.8768705Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T15:44:22.8804048Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T15:44:22.8862240Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.8896854Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.8930570Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.8965214Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.8999190Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9033908Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9070123Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9103414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9137665Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9171181Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9205590Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9241126Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9274701Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9310042Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9343312Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9375098Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9407501Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9440611Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9473011Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9505234Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9539340Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9575372Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9616316Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9649610Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9683030Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9717067Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9750281Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9785662Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9821712Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9854696Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9888213Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9920956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9954985Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:22.9989294Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0023730Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0057577Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0092467Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0130515Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0165882Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0201633Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0236249Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0271200Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0305476Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0341807Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0375368Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0409036Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0442811Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0476328Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0509519Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0544457Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0578171Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0611203Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0645231Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0678895Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0714320Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0748211Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0782580Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0817562Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0851316Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0886953Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0921621Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0956220Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.0991347Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1029210Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1065278Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1100841Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1137154Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1173831Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1210465Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1244537Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1279572Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1314761Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1349919Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1385561Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1421269Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1455907Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1490615Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1526258Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1561512Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1599070Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1634059Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T15:44:23.1793604Z A job completed hook has been configured by the self-hosted runner administrator
2025-12-04T15:44:23.1813025Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh'
2025-12-04T15:44:23.1821253Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T15:44:23.1821746Z ##[endgroup]
2025-12-04T15:44:31.2428163Z Cleaning up orphan processes